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Preface 


The  purpose  of  this  study  was  to  develop  a  complexity  metric  or  set  of 
metrics  that  would  be  useful  in  measuring  software  maintainability.  A  set  of 
interesting  metrics  was  assembled  from  current  literature,  and  a  series  of 
cr  teria  was  developed  to  measure  how  well  each  metric  measures 
maintainability.  Applying  the  criteria  to  the  metrics,  a  pair  of  metrics  that  will 
best  measure  maintainability  was  determined. 

Once  che  maintainability  metrics  were  decided,  rules  for  their 
implementation  were  given,  A  method  to  determine  a  threshold  value  was 
explained  so  that  valid  ranges  of  values  could  be  recommended.  A  metric 
validation  process  was  proposed  to  gather  data  that  v/111  reveal  if  the  metrics 
actually  reflect  maintainability. 

I  would  like  to  thank  several  people  who  have  given  me  support  and 
guidance  throughout  this  thesis  effort.  I  am  very  grateful  to  my  advisor,  Major 
,James  Howatt  for  his  guidance  in  helping  me  narrow  down  my  goals  when  I 
began  research,  his  assistance  throughout  the  development  of  this  thesis,  and 
his  patience  when  I  missed  deadlines.  I  also  wish  to  thank  Captain  Wade  Shaw 
for  explaining  how  Important  the  metric  threshold  and  validation  analysis 
process  is.  A  debt  of  gratitude  is  owed  Captain  David  Umphress,  whose  editorial 
comments  greatly  enhanced  the  readability  of  this  thesis.  1  would  also  like  to 
thank  my  sponsor.  Captain  Mike  McPherson,  and  Mr.  Jim  Baca  of  the  Air  Force 
Operational  Test  and  Evaluation  Center  for  their  support  and  direction.  Finally,  r 
I  would  like  to  thank  all  of  my  fellow  students  in  the  Computer  Engineering 
section,  who  made  the  "AFIT  Experience"  unforgettable.  ,, 
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Abstract 

S--. 

The  purpose  of  this  s-trtrd-y  was  to  survey  automatable  software 
maintainability  metrics  for  Inclusion  In  the  Air  Force  Operational  Test  and 
Evaluation  Center's  (AFOTEC's)  software  maintainability  evaluations.  This 
research  was  looking  fcr  metrics  that  would  measure  maintainability,  could  be 
automated,  and  would  fit  into  existing  guidelines.  First,  a  pet  of  software 
complexity  metrics  was  investigated.  Then,  a  set  of  criteria  to  determine  if  a 
complexity  metric  measures  maintainability  was  developed.  After  comparing  the 
metrics  to  the  criteria,  a  subset  of  two  metrics  that  met  the  criteria  better  than 
any  other  metrics  was  derived, 

The  software  complexity  metrics  evaluated  were  placed  into  three 
categories:  size  metrics,  structure  metrics,  and  hybrid  metrics.  The  structure 

metrics  include  both  data  structure  and  control  structure  metrics.  The  hybrid 
metrics  include  metrics  blended  from  two  of  the  other  groups,  such  as  a 
combination  of  size  and  structure  metrics. 

The  metric  selection  criteria  included  three  categories;  general 
applicability  criteria,  control  flow  complexity  criteria,  and  data  flow  complexity 
criteria.  An  assumption  was  made  that  the  metric  or  combination  of  metrics 
that  met  the  most  of  these  criteria  would  best  reflect  software  maintainability,  I  f  j 
A  combination  of  a  data  structure  metric,  information  flow,  and  a  control 
structure  metric.  MEasurement  Based  on  Weights  (MEBOW),  was  determined  to 
meet  more  criteria  than  any  other  metric  or  combination  of  metrics.  This  hybrid 
metric  was  suggested  for  AFOTEC  use. 
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Further  information  explaining  theoreticai  and  eiapiricai  justification  for 
the  use  of  these  metrics  was  given.  A  description  of  techniques  to  determine 
metric  threshold  values  was  discussed,  along  with  a  procedure  for  metric 
validation.  Finally,  a  theme  of  the  limitations  inherent  in  measuring 
maintainability  with  automated  metrics  was  elaborated. 


MODIFYING  AFOTEC’S  SOFTWARE  MAINTAINABILITY  EVALUATION  GUIDELINES 

L  littroduction 

Software  metrics  are  tools  to  measure  the  intrinsic  complexity  of  software 
systems  in  order  to  gauge  the  software  design's  "quality  and  effectiveness" 
(Prather,  1984:340).  The  quality  of  software  should  be  measured  to  determine  if 
it  is  both  testable  and  maintainable  (McCabe,  1983:3).  These  issues  are 

important  because  testing  requires  a  large  amount  of  software  development  time, 
and  software  maintenance  requires  between  50  and  75  percent  (Henry  and 
Kafura,  1981:510)  of  the  software  life-cycle  costs. 

The  Air  Force  Operational  Test  and  Evaluation  Center  (AFOTEC)  is 
responsible  for  testing  software  being  developed  for  the  Air  Force.  It  uses 
software  metrics  to  determine  if  software  is  maintainable.  This  thesis  describes 
additional  metrics  that  AFOTEC  should  use  to  measure  maintainability.  The  use 
of  these  additional  metrics  will  complement  the  current  software  evaluation 
guidelines. 

Background 

AFOTEC  evaluates  source  >.ode  and  dv'fcumeritation  for  the  presence  of 
seven  maintainability  characteristics.  These  characteristics  are  modularity, 
descriptiveness,  consistency,  simplicity,  expandability,  testability,  and 
traceability.  Each  will  be  described  later.  Standardized  questionnaires  are 
filled  out  by  software  engineers  witc  are  "knowledgeable  in  software  procedures, 
techniques,  and  maintenance,  but  need  not  have  a  detailed  knowledge  of  the 
functional  area  of  the  program"  (Peercy,  1981:343).  The  evaluators  answer  the 
questions  within  the  questionnaire  with  respect  to  the  software.  Responses  are 
analyzed  and  averaged  to  yield  a  maintainability  rating. 
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Appraising  software  by  this  technique  provides  several  advantages.  This 
evaluation  method  can  be  used  on  any  type  of  software,  regardless  of  the 
implementation  language.  Weaknesses  and  strengths  can  be  highlighted  at  any 
level,  between  subsystems  within  the  ,'ioftware.  or  in  comparison  among  systems. 
As  both  source  code  and  documentation  are  considered,  any  discrepancies 
between  bow  the  specification  says  the  software  is  constructed  and  how  it  is 
actually  implemented  can  be  discovered.  AP'OTEC's  analysis  of  historical  data 
suggests  that  their  evaluation  results  correlate  well  with  how  difficult  the 
software  was  to  maintain.  This  implies  that  the  current  process  does  measure 
software  maintainability. 

This  evaluation  method  has  the  disadvantage  of  being  labor  intensive.  It 
requires  tliat  evaluators  perform  time-consuming  activities  such  as  counting  the 
numbers  of  operands  and  operators  in  the  source  code.  This  means  that  it  is 
expensive  to  assess  software  using  this  method.  Because  this  evaluation  is 
done  manually,  typically  only  about  ten  percent  of  the  source  code  in  large 
programs  is  examined.  If  this  process  were  automated,  all  of  the  code  could  be 
measured,  and  the  procedures  that  are  shown  to  be  more  complex  could  later  be 
evaluated  in  more  detail  using  APXtTEC's  current  guidelines.  The  methodology 
used  does  not  consider  the  overall  software  design,  which  is  another  drawback. 
Instead  of  Judging  design  issues  such  as  the  connections  between  modules  and 
how  well  the  software  has  been  modularised,  the  evaluation  method  looks  at 
eacti  module  as  a  separate  entity.  If  metrics  that  measure  design  complexity 
are  used,  the  complexity  of  the  inter-module  data  passing  and  tlu*  program 
calling  structure  can  be  considered.  To  eliminate  tliese  problems,  additional 
software  metrics  should  be  used  by  AEOTKf  to  grade  softwa'-e,  and  they  should 
be  automated  to  reduce  the  evaluators'  wiu-kiuad. 
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Summary  of  the  AFOTEC  Software  Maintainability  Guidelines 


•  The  following  definitions  describe  what  AFOTEC's  guidelines  are  trying  to 
measure.  Then  the  criteria  that  used  to  measure  maintainability  are  detailed. 
ANSl/IEEE  Standard  729  (Schrieidewind,  1987:303)  states: 

I#  Maintenance:  Modification  of  a  software  product  after 

delivery  to  correct  faults,  to  improve  performance  or  other 
attributes,  or  to  adapt  the  product  to  a  changed  environment. 

Maintainability:  The  ease  with  which  a  software  system  can 

be  corrected  when  errors  or  deficiencies  occur,  and  can  be  expanded 

•  or  contracted  to  satisfy  new  requirements. 

The  AFOTEC  pamphlet  800-2,  Vol.  3  (referred  to  as  the  Vol.  3  from  now 
on),  "Softw.'ire  Mairitalnabillty  -  Evaluation  Guide",  standardized  questionnaire 

9  assesses  maintainability  with  respect  to  software  source  code  and  documentation. 

Quoting  from  the  Vol.  3  itself,  "These  questionnaires  |the  Vol.  3)  are  designed  to 
determine  the  presence  or  absence  of  certain  desirable  attributes  in  a  given 

0  software  product"  (AP'OTEC,  1988:1).  These  desirable  attributes  are  the  seven 

characteristics: 

Modularity  :  "Software  possesses  the  characteristic  of  modularity  to  the 

•  extent  a  logical  partitioning  of  software  into  parts,  components,  and/or  modules 
has  occurred"  (AFOThhJ,  1988:5).  The  documentation  is  evaluated  to  determine  if 
It  is  partitioned  Into  separate  parts  or  volumes  that  each  has  a  distinct 

•  purpose  Similarly,  source  code  is  evaluated  to  determine  the  level  of  use  of 
structured  programming  techniques. 

Des(  rlptlveness  :  "Software  possesses  the  characteristic  of  descriptiveness 

•  to  the  extent  tliat  it  contains  informallon  regarding  its  objectives,  assumptions. 
Inputs,  processing,  outputs,  components,  revision  status,  etc"  (AFOTEC,  1988:5). 
This  characteristic  is  used  to  measure  how  well  described  the  software's  design 

0  and  operation  Is  Self-descriptive  source  language  constructs  and  accompanying 

cuinriients  c.an  facilitate  efforts  to  understand  program  operation 
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Consistency  ;  "Software  possesses  the  characteristic  of  consistency  to  the 
extent  the  software  products  correlate  and  contain  uniform  notation,  terminology, 
and  symbology"  (AFOTEC,  1988:5),  This  characteristic  is  used  to  measure  how 
well  the  software  designers  followed  standards  in  creating  documentation  and 
how  well  coding  conventions  were  followed.  The  use  of  a  naming  convention  for 
global  data  and  a  standard  indentation  convention  fall  under  this  characteristic. 

Simplicity  :  "Software  possesses  the  characteristic  of  simplicity  to  the 
extent  that  it  reflects  the  use  of  singular  concepts  and  fundamental  structures 
in  organization,  language,  and  Implementation  techniques"  (AFOTEC,  1988:6). 
Simplicity  is  the  overall  guideline  that  size  and  control  flow  are  measured 
against. 

Expandability  :  "Software  possesses  the  characteristic  of  expandability  to 
the  extent  tliat  a  physical  change  to  information,  computational  functions,  data 
storage  or  execution  time  can  be  easily  accomplished  once  the  nature  of  what  is 
to  be  changed  is  understood"  (AFOTEC,  1988:6).  Measuring  expandability  shows 
how  much  room  for  growth  has  been  designed  into  a  program  in  relation  to  its 
storage  space,  timing  requirements,  etc. 

Testability  :  "Software  possesses  the  characteristic  of  testability  to  the 
extent  it  contains  aids  which  enhance  testing"  (AFOTEC,  1988:6).  It  is 
important  that  the  software  be  Instrumented  for  testing  after  modification,  so 
that  correct  program  execution  can  be  verified  and  validated. 

Traceability  :  "Software  possesses  the  characteristic  of  traceability  to  the 
extent  that  information  regarding  all  program  elements,  and  their 
implementation,  can  be  traced  between  all  levels  of  lesser  and  greater  detail" 
(AFOTEC,  1988:7).  This  characteristic  measures  how  easily  a  malntalner  can 


trace  the  operation  of  a  module  to  Its  documentation  and  can  follow  functions  in 
the  documentation  to  the  modules  that  perform  the  functions. 

These  characteristics  form  the  criteria  for  analysis  of  what  makes 
software  more  maintainable.  While  these  characteristics  are  not  the  only  ones 
that  can  be  measured  to  assess  software  maintainability  (others  include 
reliability,  modifiability,  etc.),  they  appear  to  be  a  representative  sample  of 
software  quality  characteristics  (Boehm  and  others,  1980:229-231  and  Peercy, 
1981:343-344), 

Problem 

The  AP^OTEC  evaluation  guidelines  are  labor  Intensive  and  expensive  to 
Implement.  These  guidelines  do  not  evaluate  the  overall  software  design.  While 
the  evaluation  of  different  types  of  systems  may  require  a  different  emphasis, 
no  procedures  exist  to  weigh  the  seven  characteristics.  Also,  only  a  fraction  of 
the  delivered  code  is  fully  assessed. 

Approach 

Specific  additional  software  maintainability  metrics  will  be  identified  for 
incorporation  into  the  Vol.  3.  These  metrics  will  measure  aspects  of 
muintainabillty  that  are  presently  not  adequately  covered.  They  will  be 
automatable  so  that  more  labor  will  not  be  added  to  the  evaluators'  workload. 
Algorithms  to  develop  a  program  to  measure  these  metrics  will  be  developed, 
although  the  actual  measurement  tool  will  not  be  built. 

Assumptions 

Two  assumptions  were  made  about  the  researched  software  metrics.  First, 
the  metrics  must  enhance  the  measurement  of  software  maintainability.  There 
are  many  different  types  of  software  metrics.  Some  measure  other  factors  than 
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those  related  to  maintainability.  Second,  the  metrics  must  fit  into  the  scope  of 
the  desirable  attributes  being  measured.  These  desirable  attributes  were 
described  in  a  previous  section. 

Scope 

From  the  constraints  explained  in  tlie  previous  section,  the  software 
metrics  that  will  be  suggested  to  AFOTEC  will  be  limited  to  metrics  that. 

1.  Will  fit  Into  the  Vol.  3  process 

2.  Can  be  automated. 

3.  AFOTEC  can  he  convinced  to  use  and  acquire  a  tool  to  automate  the 
calculation  of  these  metrics. 

A  plan  to  validate  these  new  metrics  will  be  proposed.  Algorithms  to 
show  how  these  metrics  should  measure  source  code  will  be  developed. 

Sequence  of  Presen  cation 

Chapter  Two  presents  a  review  of  classic  and  newer  software  metrics. 
Chapter  Three  discusses  criteria  to  determine  which  metrics  measure 
maintainability.  Chapter  Four  describes  in  detail  which  metrics  will  be 
suggested  to  AFOTEC  and  how  these  metrics  should  be  incorporated  within  the 
existing  guidelines  of  the  Vol.  3.  along  with  a  method  to  validate  the  metrics' 
use.  Chapter  Five  includes  my  conclusions  and  recommendations  for  further 
research. 
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11.  Literature  Review 


introduction 

Research  to  date  has  not  found  metrics  that  specifically  measure  program 
maintainahility;  most  metrics  measure  program  complexity.  Complexity  can  be 
defined  as  "a  characteristic  of  the  software  interface  which  influences  the 
resources  another  system  will  expend  or  commit  while  interacting  with  the 
software"  (Conte  and  others,  1985;17>.  According  to  Harrison,  "maintenance  is 
most  affected  by  program  complexity"  (Harrison  and  others.  1982:65).  As 
program  complexity  greatly  contributes  to  malntainablilty,  I  will  use  these 
complexity  metrics  to  me.asure  maintainability. 

This  chapter  presents  a  review  of  classic  and  newer  software  metrics. 
Classic  metrics  are  pioneering  work  such  as  as  Lines  of  Code,  Halstead's 
Software  Science,  and  McCabe's  cyclomatlc  complexity  which  have  been 
extensively  examined  in  the  literature  (Cote  and  others,  1988:121).  The  metrics 
are  presented  in  three  sections,  as  size  metrics,  as  data  and  control  structure 
metrics,  and  as  composite  or  aybrid  metrics.  This  list  of  metrics  Is  not  intended 
to  be  inclusive,  but  to  show  what  factors  of  program  complexity  are  measured 
and  the  metrics  that  attempt  to  measure  these  factors. 

Size  Metrics 

Size  metrics  measure  program  size  and  reflect  that  the  volume  of 
information  to  be  studied  to  understand  th«'  program  contributes  to  its 
complexity  (Harrison  and  others.  1982:66).  Because  the  effort  needed  to  develop 
a  program  largely  depends  on  the  amount  of  code  written,  size  measures  have 
been  used  to  assess  the  amount  of  effort  required.  The  program  size  is 
Important  for  three  reasons  (Conte  and  others,  .1986:32): 


1.  It  is  easy  to  compute  after  the  program  is  compieted. 

2.  It  is  the  most  important  factor  for  many  models  of 
software  development, 

3.  It  is  the  basis  of  most  productivity  measures. 

Lines  of  Code.  The  earliest  and  most  familiar  software  size  measure  is 
the  number  of  lines  of  source  code  (Levitin,  1986:314).  This  measure  is  labeled 
S  and  is  measured  in  lines  of  code  (LOG)  or  thousands  of  lines  of  code  (KLOC). 
While  this  may  seem  to  be  a  very  simple  and  easily-calculated  metric,  much 
debate  has  centered  around  how  LOG  should  be  counted. 

While  this  measure  is  natural  for  some  languages  such  as  various  assembly 
languages  and  FORTR.AN  which  have  very  close  to  a  one-to-one  correspondence 
between  the  number  of  statements  and  the  lines  of  a  program,  newer  languages 
that  allow  a  more  free  format  cannot  be  counted  quite  so  easily.  For  example, 
Figure  1  shows  two  code  fragments  which  are  functionally  equivalent,  but  have 
apparently  different  counts  for  LOG. 


Figure  1.  Differing  statement  Counts  for  the  Same  Function 

(Levitin,  1086:315) 


These  examples  have  LOG  counts  of  seven  and  four.  As  they  are 
semantically  the  same  program,  the  LOG  count  must  be  measuring  the  size  of  the 
program's  represent.atlon.  instead  of  the  actual  size  of  the  p>rogram  (Levitin, 


1986:315).  Another  problem  with  the  LOC  count  Is  that  it  Is  possible  to  pad 
the  program  with  blank  lines  and  comments  to  give  artificially  high  LOC  counts. 

Many  languages  require  descriptive  non-executable  statements,  such  as 
COBOL's  ENVIRONMENT  division  or  Pascal's  Var  section  (Conte  and  others, 
1986:35).  Some  researchers  have  suggested  that  since  these  are  not  executable 
statements  they  should  not  be  counted  In  the  LOC.  Others  have  said  that  since 
understanding  a  program's  data  is  critical  to  understanding  the  operation  of  the 
program,  the  variable  declarations  and  program  headers  should  be  included  in 
the  LOC  count.  A  simple  solution  to  this  problem  is  to  have  a  consistent 
counting  scheme  and  always  use  It  to  count  LOC.  An  example  of  a  counting 
strategy  comes  from  Conte:  "A  line  of  code  is  any  line  of  program  text  that  is 
not  a  comment  or  blank  line,  regardless  of  the  number  of  statements  or 
fragments  of  statements  on  the  ilne.  This  specifically  Includes  all  lines 
containing  program  headers,  declarations,  and  executable  and  non-executable 
statements"  (Conte  and  others,  1986:35). 

LOC  is  not  a  context  sensitive  software  metric.  As  an  example,  twenty 
lines  of  conditional  statements  that  control  manipulating  dynamic  memory 
constructs  will  be  Inherently  more  complex  than  twenty  lines  of  simple  variable 
assignment  statements.  But  with  the  LOC  measure,  each  program  will  have  the 
same  count. 

Halstead's  Software  Science  (N).  Dr.  Maurice  Halstead  developed  a 
measure  of  program  size  within  his  Software  Science  software  metrics.  This 
metric  measures  the  number  of  operators  and  operands  in  a  program  (Levitin, 
1986:316).  Operators  include  arithmetic  and  logic  symbols,  functions,  and 
delimiters  such  as  +  and  -.  Operands  Include  variables,  constants,  and  labels, 
and  any  other  symbol  that  represents  data. 


From  two  basic  quantities  Halstead's  program  length  metric,  N.  is 
calculated  (Conte  and  others,  1986:37): 

N  =  N1  +  N2  (1) 

where 

N1  =  the  total  number  of  operators 

N2  =  the  total  number  of  operands 

As  with  LOG,  there  is  some  difficulty  determining  what  should  be  counted 
as  an  operator  and  what  should  be  counted  as  an  operand.  In  some  languages, 
such  as  LISP,  the  difference  between  operators  and  operands  Is  not  clear.  In  a 
procedural  language  like  Pascal  or  Ada,  a  function  that  is  embedded  in  another 
function,  such  as  "WRITE(COS(25))",  can  be  considered  both  an  operator  and  an 
operand.  The  COS(INEl  function  is  an  operator  because  it  operates  on  the  data 
that  is  input,  but  it  is  an  operand  also  because  its  resulting  value  is  used  as 
data  for  the  WRITE  function. 

Halstead  originally  stated  in  his  counting  rules  that  input/output 
statements  and  program  declarations  should  not  be  counted.  Also,  the  statement 
labels  used  as  branching  addresses  for  GOTO  statements  were  not  considered 
operands,  but  as  an  integral  part  of  the  GOTO's  that  branched  to  the  label. 
Currently,  research  suggests  Software  Science  counting  rules  should  Include 
counting  the  symbols  in  the  declaration  and  input/output  statements,  as  well  as 
counting  each  distinct  label  as  another  operand  (Shen  and  others,  1983:157). 

Figure  2  shows  Ramamurthy  and  Melton’s  example  of  counting  operands 
and  operators  (Ramamurthy  and  Melton,  1986:309).  The  guideline  of  counting 
operators  within  the  declaration  statement  Is  not  followed  in  this  example.  The 
eleven  operators  are  "BEGIN  END",  "readln",  "()", 

"writeln",  and  These  operators  are  used  23  times.  The  five  operands  are 


PROGRAMl (input ,  output); 
VAR 

a,b,c,d,in  :  integer; 
BEGIN 

readln(a,b,c,d)  ; 
a  :=  a  +  b; 

b  :=  a  +  c; 

c  :=  b  *  d; 

IP  :=  a  +  b  -  c; 

^riteln (m) 

END. 


Figure  2.  Halstead's  N  Example 
(Ramauurthy  and  Melton,  1986:309) 


listed  after  the  VAR  statement.  These  operands  are  used  18  times.  This  gives 
values  of  23  for  Nl,  18  for  N2.  and  41  for  N. 

While  Halstead's  overall  theory  of  Software  Science  has  been  criticized  as 
having  no  valid  theoretical  basis  (Hamer  and  Frewin,  1982:198),  the  N  measure 
has  not  been  faulted  as  some  other  metrics  have  been.  Shen  states  "there  is  a 
large  amount  of  empirical  evidence  to  suggest  (N's)  validity,  although  it  appears 
to  work  best  in  the  range  of  N  between  2000  and  4000  for  programs  written  in 
Fortran,  Cobol,  and  PL/S"  (Shen  and  others,  1983:163).  But  like  LOG,  this 
measure  is  not  context  sensitive  in  that  it  does  not  weight  some  operators  as 
being  inherently  more  complex  than  other  operators. 


Structure  Metrics 

This  category  investigates  the  system  design  structure,  and  constitutes 
the  data  relationships  among  system  components  and  the  control  flow  within 
system  components.  Some  structure  metrics  have  an  advantage  over  size  metrics 
because  tney  can  be  applied  early  In  the  system  lifecycle  since  they  are  based 
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on  higher-  level  design  features,  not  the  actual  source  code  (Kafura  and 
Canning,  1985:379).  Some  control  structure  metrics  can  evaluate  the  complexity 
of  a  program’s  structure  using  its  Program  Design  Language  (PDL),  which  Is 
available  before  the  source  code.  Some  data  structure  metrics  can  evaluate  a 
program's  complexity  if  the  program's  data  flows  are  known  before  coding. 

Data  Structure  Metrics.  One  factor  that  effects  the  complexity  of  a 
program  is  the  amount  of  data  the  program  uses,  how  it  is  used,  and  its 
configuration  within  the  program. 

Span.  Span  is  a  measure  of  the  "number  of  statements  between  two 
successive  references  to  the  same  variable"  (Conte  and  others,  1986:56).  A 
large  span  Increases  the  difficulty  of  determining  the  value  of  the  variable  at 
any  point.  A  large  span  could  require  a  maintenance  programmer  to  search 
through  many  lines  of  source  code  to  understand  a  variable's  usage  (Harrison 
and  others.  1982:67). 

According  to  Harrison  (Harrison  and  others,  1982:67),  span  is  not 
supported  by  empirical  evidence  that  it  represents  the  complexity,  and, 
therefore,  maintainability  of  a  program.  But  he  does  state  >  hat  span  is 
intuitively  appealing,  because  a  variable  with  a  large  span  is  inherently 
difficult  to  keep  track  of.  Figure  3  shows  Harrison's  example  of  span  between 
data  references. 

Information  Flow.  Previous  metrics  measure  complexity  within  a 
single  module.  Because  many  programs  contain  more  than  one  module,  some  way 
to  measure  the  connections  among  modules  is  needed.  information  flow  is  a 
method  to  measure  the  sharing  of  data  among  modules.  The  information  flow 
metric  captures  properties  of  module  connections  that  are  more  detailed  than 
Just  "calling"  relations.  Information  flow  measures  the  amount  of  data  that 
flows  into  a  module  and  is  modified  by  the  module. 
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Figure  3.  Span  Example 
{Harrison  and  others,  1982:66) 


Harrison  and  Cook  describe  information  flow  as  a  "macrolevel"  metric 
(Harrison  and  Cook,  1987:215).  A  raacrolevel  metric  determines  the 
interrelationships  of  the  subprograms  to  each  other  in  order  to  understand  the 
behavior  of  the  overall  system.  These  metrics  "concentrate  on  the 
communication  links  between  subprograms--the  more  links,  the  more  complex  the 
macrolevel  understanding"  (Harrison  and  Cook,  1987:215).  A  potential  problem 
with  each  link  Is  that  It  may  introduce  "side  effects"  into  other  system 
subprograms  (Harrison  and  Cook,  1987:215). 

The  information  flow  complexity  value  for  a  module  is  determined  by  two 
factors-  the  complexity  of  the  module's  code  and  the  complexity  of  the 
connections  to  its  environment  (Henry  and  Kafura,  1981:513).  The  complexity  of 
the  connections  is  evaluated  by  the  module's  fan-in  and  fan-out.  The  fan-in 
is  the  number  of  local  (parameter)  data  flows  into  the  module  and  the  number  of 
global  data  structures  that  the  module  gets  information  from.  The  fan-out  is 
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the  number  of  data  flows  from  the  module  and  the  number  of  global  data 
structures  that  the  module  modifies. 

It  is  interesting  to  note  that  while  the  complexity  of  the  module's  code  is 
mentioned  as  a  factor  within  information  flow,  it  is  disregarded  in  later 
calculations.  Henry  and  Kafura  state  that  "code  length  is  only  a  weak  factor  in 
the  complexity  measure. ..this  factor  may  be  omitted  without  significant  loss  of 
accuracy"  (Henry  and  Kafura,  1981:514).  But  other  empirical  validations  of  this 
metric  (Kafura  and  Canning,  and  Harrison  and  Cook)  use  length  as  a  factor  in 
the  information  flow  calculations.  Kafura  and  Canning  refer  to  the  use  of 
length  (LOG)  with  information  flow  as  "weighted  information  flow"  and  consider 
it  a  hybrid  (Kafura  and  Canning,  1985;  380). 

Henry  and  Kafura  evaluated  different  formulas  to  calculate  the  complexity 
of  the  modules  in  the  Unix  kernel.  The  formulas  included: 


(length  **  2)  (2) 
(fan-in  *  fan-out)  (3) 
(fan-in  *  fan-out)  **  2  (4) 
(length)  *  (fan-in  *  fan-out)  **  2  (5) 


where  length  is  the  number  of  lines  of  text  in  a  procedure.  Including  embedded 
comments  but  not  including  the  comments  in  the  procedure's  "header  block" 
(Henry  and  Kafura,  1981:513). 

They  found  "the  connections  of  a  procedure  to  its  environment,  namely 
(fan-in  •  fan-out)  ••  2,  Is  an  extremely  good  Indicator  of  complexity"  (Henry 
and  Kafura.  1981:516).  Figure  4  shows  an  example  information  flow  count  using 
this  formula.  This  Is  a  correlation  to  the  number  of  changes  made  to  each 
module.  They  state  that  studies  have  shown  a  high  correlation  between  program 
changes  and  error  occurrences,  which  relates  to  maintainability  (Henry  and 
Kafura,  1981:515).  Kafura  also  uses  lr\formation  flow  to  detect  outliers  in  the 
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number  of  errors  and  amount  of  coding  time  required  for  large  NASA  Fortran 
projects,  as  further  validation  of  the  metric  (Kafura  and  Canning,  1985:382). 
Outliers  are  those  components  that  are  more  than  one  standard  deviation  above 
the  mean  for  coding  time  required  or  in  the  number  of  errors  they  contain. 
Harrison  states  that  his  data  "suggests  that  the  metrics  (information  flow)  work 
quite  well  in  Identifying  'extraordinary  cases'"  (Harrison  and  Cook,  1987:218). 


• 

procedure  EXAMPLEl (Input 1  :  integer; 

Input2  :  integer; 
var  Outputl  :  integer; 
var  0utput2  :  integer); 

1 

begin 

• 

0utput2 

Globall 

Outputl 

Global2 

10; 

:=  Inputl; 

;=  Inputl  +  Input2  +  Global2; 

:=  Outputl  *  0utput2; 

end; 

Figure  4.  Infornation  Flow  Example 


Figure  4  shows  an  example  of  source  code  in  a  Pascal-like  language.  The 
procedure  EXAMPLE!  has  two  local  data  flows  and  one  global  data  flow  into  the 
procedure,  for  a  total  fan-ln  of  three.  This  procedure  has  two  local  data  flows 
and  two  global  data  flows  out  of  the  procedure,  for  a  total  fan-out  of  four. 
Note  that  the  global  variable  Globai2  is  used  as  both  an  input  and  an  output 
flow  of  information.  The  information  flow  complexity  for  this  module  =  (3*4) 
“  2,  which  is  144. 
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Coupling  is  "the  degree  of  interdependence  between  two  modules"  (Page- 

•  Jones,  1980:101).  Minimal  coupling  reflects  that  each  module  is  as  independent 
as  possible  from  other  modules.  This  indicates  that  a  system  has  been 
partitioned  appropriately  (Page-Jones,  1980:101).  The  information  flow  metric 

•  can  indicate  the  degree  of  coupling  between  procedures  via  the  fan-in  and  fan¬ 
out  terms.  According  to  Henry  and  Kafura,  this  can  reveal  the  existence  of 
three  types  of  problems  for  a  procedure  (Henry  and  Kafura,  1981:514).  High 

•  fan-in  or  fan-out  suggests  that  a  procedure  may  perform  more  than  one 
function,  which  is  contrary  to  structured  decomposition  rules  (Page-Jones, 
1980:1  19),  Related  to  this  point  is  that  information  flow  measurements  may 

9  Indicate  a  procedure  that  was  Inadequately  refined  and  needs  to  be  divided  into 

two  or  more  separate  procedures.  A  procedure  having  high  complexity  may  be  a 
"stress  point"  that  has  a  large  amount  of  information  traffic.  Because  of  the 

•  large  number  of  potential  effects  on  the  entire  system,  the  procedure  may  be 
difficult  to  modify. 

Henry  and  Kafura  state  that  one  of  the  benefits  of  Information  flow  is 

•  that  the  data  necessary  to  compute  the  metric  is  available  during  the  design 
phase  of  software  development  (Henry  and  Kafura,  1981:511).  This  is  a 
significant  advantage  over  many  of  the  metrics  explained  here,  which  cannot  be 

•  measured  until  the  source  code  has  been  delivered.  Information  flow  is  not 
related  to  the  source  language  used,  which  means  that  it  is  widely  applicable. 

t_pntrol  Structure  Metrics.  These  metrics  measure  how  easily 

•  understandable  the  control  structures  are  in  a  program.  These  metrics  measure 
the  number  of  control  transfers  within  a  program,  or  how  the  control  transfers 
are  interrelated. 

•  McCabe's  Cyclomatlc  Complexity’.  Another  classic  metric  is  McCabe's 
cyclomatlc  complexity  measure.  Kearney  notes  that  "McCabe  considers  the 
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program  as  a  directed  graph  in  which  the  edges  are  lines  of  control  flow  and 
the  nodes  are  straight  line  segments  of  code.  The  cyclomatlc  number  represents 
the  number  of  linearly  independent  execution  paths  tlirough  the  program" 
(Kearney,  1986:1045).  The  metric  measures  the  number  of  basic  paths  through  a 
program  using  graph  theory  to  represent  the  paths  instead  of  actually  counting 
them,  which  may  be  impractical  (McCabe,  1983:3). 

To  calculate  a  m.odule's  cyclcmatic  complexity,  a  directed  graph  "G"  is 
generated,  reflecting  the  control  structure  of  the  module.  A  node  corresponds  to 
a  block  of  sequential  code.  An  edge  corresponds  to  a  control  transfer  between 
nodes.  The  number  of  connected  components  is  the  number  of  distinct 
procedures,  which  is  typically  1.  The  formula  for  calculating  the  cyclomatlc 
complexity  of  a  weakly  connected  flow  graph  is  given  in  (McCabe,  1983:4)  as: 
v(G)=e-n+2p  (6) 

where 

e  =  the  number  of  edges 
n  =  the  number  of  nodes 
p  =  the  number  of  connected  components 

and  v(G)  is  equal  to  the  number  of  basic  paths  in  the  measured  program, 
figure  5  presents  an  example  of  code  from  Kamamurthy  and  Figure  6  shows  its 
directed  graph  representation. 

Construction  of  a  directed  graph  can  be  time-consuming.  Fortunately, 
Marian  D.  Mills  proved  that  the  cyclomatlc  complexity  of  a  structured  program  is 
one  more  than  the  number  of  decisions  (McCabe,  1983:9).  This  means  that  v(G) 
can  "be  readily  cah  ulated  by  simply  Inspecting  the  program"  (Myers,  1977:62) 
and  automated  program  scanners  have  been  built  to  caiculate  the  complexity  of 
programs. 
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From  Figure  6  the  number  of  nodes  is  n  =  11.  The  number  of  edges 
connecting  these  nodes  is  e  =  13,  with  an  extra  arc  from  the  exit  node  to  the 
entry  node  added  to  create  c  strongly  connected  graph.  This  extra  arc  adds 
one  to  the  cyclomatic  number,  so  th*'  number  of  connected  components  p  is  used 
instead  of  2p.  The  number  cf  conne-ctci  components  is  p  =  1.  Therefore,  v(G) 
=  13  -  11  +  1  =  3.  Note  that  this  number  can  be  easily  calculated  by 
inspecting  the  program.  Adding  one  to  the  number  of  decisions  (IF  statements) 
equals  3. 


PROGRAM2 (input,  output); 
VAR 

a,b,c,d,tn  :  integer; 
BEGIN 

readln(a,b,c,d) ; 

IF  a  >  b  then 

IF  b  >  c  then 
m  :=  a  +  b 

ELSE 

m  ;=  b  +  c 

ELSE 

o  :=  c  +  d  +  a; 
writeln(in) 

END. 


Figure  5.  Example  Program 
(Ramauurthy  and  Melton,  1986:309) 


Empirical  evidence  supports  the  cyclomatic  number  as  a  complexity 
measure.  Curtis  stated  that  cyclomatic  complexity  is  "related  to  the  difficulty 
programmers  experience  in  locating  errors  In  code"  (Curtis  and  others,  1980:307). 
Henry  said  the  cyclomatic  complexity  metric  is  a  "useful  indicator  of  the 
occurrence  of  errors"  (Henry  and  others,  1983:130). 
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Figure  6.  Example  Directed  Graph  Representation  of  Figure  5 

(Ramamurthy  and  Melton,  1986:310) 


Shepperd,  to  the  contrary,  states  that  the  high  correlations  obtained 
between  cyclomatic  complexity  and  errors  is  invaiid  because  "the  fundamental 
problem  remains  that  without  an  explicit  underlying  model  the  empirical 
'validation'  is  meaningless  and  there  is  no  hypothesis  to  be  refuted"  (Shepperd, 
1988:35).  He  points  out  that  researchers  have  also  tried  to  measure  inter¬ 
module  complexity  with  cyclomatic  complexity,  and  that  cyclomatic  complexity 
can  only  measure  intra-module  complexity.  McCabe  suggests  to  measure  the 
complexity  of  a  program,  the  cyclomatic  complexity  of  each  module  should  be 
added  to  the  number  of  modules,  and  an  overall  complexity  score  will  be  given. 
This  does  not  take  into  account  that  an  astute  partitioning  of  a  program  into 
modules  makes  each  smaller  module's  control  flow  easier  to  understand.  Also, 
the  data  flow  among  mcdules  is  ignored  completely. 
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A  weakness  of  cyclomatic  complexity  is  that  it  cannot  measure  the 
complexity  of  software  that  Is  due  to  size  (Ramamurthy  and  Melton,  1986:310). 
A  10000-llne  program  with  only  8  decision  points  is  intuitively  complex,  but  Its 
cyclomatic  complexity  is  not  high  enough  to  attract  attention,  as  McCabe  states 
that  10  is  a  "reasonable"  upper  limit  for  cyclomatic  complexity  (McCabe.  1983:9). 
This  example  is  somewhat  far-fetched,  but  it  illustrates  the  problem.  Because 
cyclomatic  complexity  looks  at  a  graphic  representation  of  the  program,  and  not 
the  program  Itself,  cyclomatic  complexity  will  not  be  able  to  reveal  a  more 
structured  version  of  the  program  because  the  same  number  of  conditional 
statements  will  exist  in  each  version  (Woodward  and  others,  1983:103). 

A  problem  that  many  researchers  have  had  with  cyclomatic  complexity  is 
that  It  does  not  take  the  nesting  levels  of  branch  statements  into  account 
(Myers.  1977:62).  According  to  Harrison,  "predicates  with  compound  conditions 
are  more  complex  than  predicates  with  a  single  condition"  (Harriscn  and  others, 
1982:70).  Myers  suggests  calculating  complexity  as  a  range,  with  the  lower 
bound  as  the  number  of  decision  statements  plus  one,  and  the  upper  bound  as 
the  number  of  Individual  conditions  plus  one  (Myers,  1977:63).  This  modification 
of  cyclomatic  comiplexlty  apparently  allows  a  finer  distinction  between  programs 
with  nested  conditional  statenicnts,  but  no  experiments  have  been  published 
supporting  this  viewpoint. 

Knot  Count.  Knot  count  was  derived  from  two  simple  measures  of 
complexity.  In  1968,  the  Communications  of  the  ACM  published  the  now  famous 
letter  by  Dr.  F.idsgt'r  W.  Dljkstra  entitled  "Goto  Statement  Considered  Harmful." 
In  this  letter,  Dljkstra  stated  that  the  "quality  of  programmers  is  a  decreasing 
function  of  the  density  of  GOTO  statements"  (Woodward  and  others,  1983:101) 
This  letter  suggested  that  the  number  of  GOTO  statements  in  a  program  Is  a 
simple  measure  of  iinstruciurednese.  Discussing  the  theoretical  basis  of  the 
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knot  count  metric,  Woodward  points  out  that  in  the  book  Software  Metrics  Glib 
states  that  "logical  complexity  is  a  measure  of  the  degree  of  decision  making 
within  a  system  and  that  the  number  of  IF  statements  Is  a  rough  measure  of 
this  complexity"  (Woodward  and  others,  1983:101). 

Knot  count  measures  the  "relations  between  the  physical  locations  of 
control  transfers  rather  than  simply  their  numbers"  (Harrison  and  others, 
1982:71).  Knot  count  is  a  measure  of  program  "unstructuredness",  as  it  looks  at 
the  number  of  GOTO  statements  to  count  the  number  of  crossing  control 
transfers.  These  control  transfer  crossings  are  knots,  and  the  greater  the 
number  of  knots,  the  more  complex  the  program  is.  Knots  represent  the 
"unstructuredness"  of  the  source  code  text,  but  does  not  represent  the  program's 
underlying  control  flow  (Howatt,  1988). 

Woodward  defines  a  knot  as: 

If  a  jump  from  line  a  to  line  b  is  represented  by  the 
ordered  pair  of  integers  (a.b),  then  jump  (p.q)  gives  rise  to  a  "knot" 
or  crossing  point  with  respect  to  jump  (a,b)  if  either 

1)  min(a,b)  <  mln(p,q)  <  max(a.b) 
and  max(p.q)  >  max(a,b) 

or 

2)  mln(a,b)  <  max(p,q)  <  max(a,b) 

and  min(p,q)  <  niin(a,b)  [Woodward  and  others  1983:1021 

An  example  from  Woodward  with  nine  knots  is  illustrated  in  F'igure  7.  An 
advantage  that  the  knot  count  has  over  cycloraatlc  complexity  is  that  a  program 
that  is  unstructured  and  has  a  high  knot  count  can  be  rewritten  in  a  more 
structured  fashion  and  have  fewer  knots.  This  is  because  the  number  of  knots 
in  a  program  depends  on  the  order  of  the  statements  (Woodward  and  others, 
1983:103),  Harrison  says  the  knot  count  is  an  interesting  metric,  but  no 
research  has  applied  it  to  the  maintenance  of  programs  (Harrison  and  others, 
1982:71), 
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CALL  TPR 

— - -  IF  (ZR)  500,  500,  100 

L_.>i00  CALL  TED 
1 — >150  IF  {Z3)  200  200  550 
I — >200  ZG  =  ZG  +  1 
ZC  =  0 
CALL  TCO 
300  CALL  TRA 

-  GOTO  2000 

500  CONTINUE 
Z3  =  1 
GOTO  150 
CONTINUE 
CALL  TEC 
ZB  =  ZB  +  1 
ZC  =  ZC  +  1 
GOTO  300 
— >2000  RETURN 
END 

4  =  KNOT 


Figure  7.  Knot  Example 
(Woodward  and  others,  1983:104) 


MEBOW  (MEasure  Based  On  Weights).  MEBOW  was  developed  as  a 
control  flow  metric  that  measures  complexity  as  well  as  the  three  metrics 
cyclomatlc  complexity,  knot  count,  and  Harrison's  SCOPE  ratio,  but  does  not  have 
their  deficiencies  (Jayaprakash  and  others.  1987:238).  MEBOW  is  a  modification 
of  the  cyclomatlc  complexity  with  the  knot  count  added,  and  with  different 
weights  for  control  structures. 

The  developers  of  MEBOW  state  that  the  program  consplexlty  is  best 
measiaed  by  coatroj  flow  metrics,  but  that  factors  other  than  control  flow 
metrics  must  also  be  considered  (Jayaprakash  and  others,  1987:238).  This  is 
why  they  count  knots  and  weigh  different  types  of  branch  statements 
differently.  Following  Woodward's  argument  that  knots  create  unstructured 


programs,  they  state  "the  Identification  of  knots  helps  in  assigning  higher 
control  flow  complexity  to  programs  containing  unstructured  forms"  (Jayaprakash 
and  ethers,  1987:240). 

MEBOW  weighs  a  backward  branch  or  knot  higher  than  a  forward  branch  or 
knot  under  the  principle  that  while  a  backward  branch  may  not  necessarily  lead 
to  a  loop,  it  does  make  a  top-down  reading  of  a  program  more  difficult,  and 
therefore  more  complex.  Explicit  branch  statements  such  as  a  GOTO  also  have  a 
higher  weight  than  an  implicit  branch  that  is  associated  with  a  structured 
programming  construct  such  as  a  FOR  loop. 

To  determine  the  MEBOW  value  for  a  module,  the  sum  of  the  weights  of  all 
branches  and  knots  is  calculated.  Branches  and  knots  are  the  only  two  basic 
programming  elements  that  are  assigned  weights  in  MEBOW.  This  process  is 
extended  in  the  same  manner  as  cyclomatic  complexity  is  across  modules,  as  the 
MEBOW  value  for  more  than  one  procedure  within  a  program  Is  the  sum  of  each 
procedure's  MEBOW  value  (Jayaprakash  and  others,  1987:241). 

As  with  the  knot  count,  there  is  no  research  that  shows  MEBOW 
effectively  measures  the  maintainability  of  programs.  Also,  MEBOW's  authors 
state  that  it  can  be  used  to  measure  inter-procedure  complexity,  but  it  ignores 
the  data  flow  between  procedures. 

Composite  (Hybrid)  Metrics 

A  composite  or  hybrid  metric  is  one  that  does  not  Just  measure  a  single 
factor  to  determine  the  complexity  of  software.  As  suggested  by  Kafura,  Conte, 
and  Hansen,  different  types  of  metrics  measure  significantly  different  aspects  of 
software.  For  example,  size  metrics  alone  » annot  reflect  which  of  two  1000-  llne 
procedures  is  more  complex.  Control  structure  metrics  cannot  differentiate 
between  one  program  that  uses  a  global  pointer  structure  and  another  program 


that  performs  the  same  function  that  operates  on  an  array  passed  as  a 
parameter  through  a  well-defined  Interface. 

Since  most  metrics  capture  only  one  factor  of  complexity,  it  makes  sense 
to  use  different  metrics  and  to  combine  the  results  into  a  vector.  Harrison,  and 
Li  and  Cheung,  all  assert  in  different  articles  that  using  a  hybrid  metric  to 
measure  complexity  is  "the  most  sensible  approach.  Software  complexity  is 
caused  by  so  many  different  factors  that  measuring  only  one  of  them  cannot 
help  but  give  unreliable  results  for  a  general  case"  (Harrison  and  others, 
1982:78;  Li  and  Cheung,  1987:708). 

As  empirical  evidence  that  this  type  of  composite  metric  does  work, 
Kafura  used  a  combination  of  the  LOG  and  information  flow  metrics  to  determine 
procedures  that  had  higher  error  and  coding  time  rates  in  three  large  NASA 
Fortran  projects.  The  composite  metric  determined  the  error  and  coding  time 
outliers  more  often  than  any  other  code  or  structure  metric  he  tested. 

Two  other  composite  metrics  that  have  some  data  supporting  their  use  are 
Ramamurthy  and  Melton's  synthesis  of  Software  Science  metrics  and  the 
cyclomatic  number,  and  Li  and  Cheung's  NEW_1.  The  Software  Science  and 
cyclomatic  complexity  metric  weighs  the  operator  and  operand  count  by  the 
nesting  level,  so  that  an  operator  in  a  purely  sequential  program  is  not 
counted,  while  the  same  operator  nested  three  levels  deep  would  count  as  three 
operators.  Ramamurthy  and  Melton  have  evidence  that  weighted  length  and 
effort  detect  differences  in  complexity  between  programs  better  than  non- 
weighted  length  and  effort  do  (Ramamurthy  and  Melton,  1986:312).  NEW__1  is  a 
composite  of  SCOPE,  which  is  a  control  graph  metric,  and  the  Software  Science 
effort  metric.  This  metric  is  a  combination  of  a  graph  metric  with  a  size  metric 
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in  an  attempt  to  receive  the  benefits  of  both  types  of  metric  (Li  and  Cheung, 
1987:702). 

When  using  two  or  more  metrics,  some  difficulties  interpreting  data  may 

arise.  As  an  example,  a  and  b  are  metrics.  If  with  two  procedures  1  and  2,  al 

>  a2  and  bl  >  b2,  then  procedure  1  Is  apparently  more  complex  than  procedure 
2.  But  If  the  metrics  are  used  to  measure  the  same  two  procedures  and  results 
of  al  >  a2  but  bl  <  b2  occur,  it  is  not  clear  which  procedure  is  more  complex. 
According  to  Conte  (Conte  and  others,  1986:80),  this  problem  is  why  more 
researchers  do  not  use  composite  metrics. 

Halstead's  Software  Science  (E).  Another  Software  Science  metric  is  Effort 
"E",  which  is  used  to  measure  the  number  of  "elementary  mental  discriminations" 
(Shen  and  others,  1983:156)  that  a  programmer  will  have  to  make  to  produce  the 

desired  program.  This  is  termed  a  hybrid  because  it  calculated  based  on 

estimations  of  the  number  of  "mental  comparisons"  needed  to  write  a  program  of 
a  certain  iength,  and  an  estimation  of  the  "program  level",  which  represents  a 
program  written  with  minimum  size  (Conte  and  others,  1986:83).  E  can  be 
approximated  by  (Conte  and  others,  1986:84): 

E  =  (nl  *  N2  *  N  *  log2  n)  /  (2  •  1)2)  (7) 

where 

r)l  =  the  number  of  unique  operators 

1)2  =  the  number  of  unique  operands 

ri  =  1)1  +  ti2  (8) 

N  =  N1  +  N2  (1) 

Using  the  example  source  code  from  Figure  2,  the  number  of  unique 
operators  Is  nl  =  11.  The  number  of  unique  operands  is  1)2  =  6.  N1  was  23, 
and  N2  was  18.  From  these  values,  the  estimated  value  of  Effort  is  E  =  3247. 
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The  E  metric  was  originally  used  to  relate  the  actual  time  a  programmer 
would  take  to  implement  a  program.  This  was  questioned  by  other  researchers 
when  they  realized  that  this  suggests  an  arbitrary  limit  on  the  mental  capacity 
of  all  programmers  (Shen  and  others,  1983:156).  While  there  is  little  evidence 
supporting  the  claim  that  E  can  predict  the  time  to  implement  a  program,  there 
is  empirical  evidence  that  this  metric  correctly  estimates  maintenance  effort  and 
the  number  of  errors  ir  modules  (Shen  and  others,  1983:162  and  Henry  and 
others,  1983:130).  Discussing  studies  conducted  at  Purdue  and  at  General 
Electric,  Shen  claims  "these  two  studies  tentatively  support  the  conclusion  that 
a  program  with  a  lower  E  measure  is  easier  to  comprehend  that  an  equivalent 
program  with  a  higher  E  value"  (Shen  and  others.  1983:162). 

Hansen's  Pair  (Cyclomatic  Number,  Operator  Count).  Shortly  after  Myers 
suggested  his  extension  to  the  cyclomatic  complexity  metric,  Hansen  came  up 
with  a  different  way  to  modify  cyclomatic  complexity  to  get  be*^ter  data.  He 
wrote  that  while  Myers  was  correct  about  the  differences  in  complexity  between 
multiple  conditions  in  the  same  branch  statement  and  a  branch  statement  with 
only  one  condition,  he  stated  that  the  difference  was  not  relevant  because  no 
matter  how  many  conditions  the  branch  has,  it  is  going  to  one  location  or  the 
other  (Hansen,  1978:30). 

Hansen  decided  to  not  extend  cyclomatic  complexity,  but  to  use  a  size 
metric  also.  After  experimenting,  he  decided  that  the  unique  operator  count  was 
the  best  in  combination  with  cyclomatic  complexity  (Hansen,  1978:33).  He  did 
not  Include  any  empirical  evidence  that  he  had  validated  the  method,  though. 

Oviedo's  model  of  program  complexity.  Oviedo  developed  a  composite 
metric  that  measures  both  control  flow  complexity  and  data  flow  complexity,  and 
reports  total  program  complexity  a.s  the  sum  of  the  two.  Control  flow  "cf" 
complexity  Is  calculated  as  the  number  of  edges  in  a  control  flow  graph  (Oviedo, 
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1980:148).  Data  flow  "df"  complexity  will  be  explained  in  a  following  paragraph. 
The  program  complexity  (C)  is  calculated  as  (Harrison  and  others.  1982:76): 

C  =  ocf  +  pdf  (9) 

where  a  and  p  are  weighting  factors,  which  are  set  to  one  (Oviedo.  1980:151). 

To  understand  "df",  two  terms  must  be  defined.  A  variabie  is  "locally 
available"  for  a  block  if  the  variable  has  been  defined  within  the  block.  A 
variable  is  "locally  exposed"  if  it  is  referenced  in  a  block  but  it  has  not  been 
defined  yet  in  the  block.  The  "df"  of  a  node  or  block  Nj  is  defined  as  "the 
number  of  prior  definitions  of  locally  exposed  variables  in  Nj  that  can  reach  Nj" 
(Harrison  and  others,  1982:76).  Figure  8  from  Harrison  shows  code  that  will  be 
used  for  an  example  "df"  calculation,  along  with  its  program  flow  graph 
(Harrison  and  others,  1982:76). 

The  "df"  of  Nq  is  always  0.  because  no  prior  definitions  can  reach  this 
block.  The  two  nodes  Nj  and  N2  each  have  "df"  of  0,  because  they  are 
assignment  statements  that  use  constants,  and  no  variables  are  locally  exposed. 
The  node  Ng  has  three  locally  exposed  variables,  x,  J,  and  k.  Each  of  these 
exposed  variables  has  been  defined  twice  before  node  Ng,  so  node  Ng  has  a  "df" 
01  6.  Adding  the  cumulative  "df"  of  all  nodes  gives  an  overall  "df"  of  six.  as 

dfo  =  dfj  =  dfg  =  0.  As  the  control  flow  graph  has  four  edges,  "cf"  is  four. 
The  overall  program  complexity  "C"  =  10. 

The  Oviedo  program  complexity  has  the  same  limitations  as  other  control 
flow  graph  metrics.  The  size  complexity  of  any  node  will  not  be  measured,  and 
establishing  a  weighting  factor  for  a  will  be  difficult  (Harrison  and  others, 
1982:78).  No  empirical  evidence  has  been  reported  to  show  how  well  this 
combination  of  control  structure  and  data  structure  metrics  work  together. 


Node  0 


Node  1 


Node  2 


Node  3 


READ  D,  X,  k 
If  n  =  1  then 
X  :=  1 
j  :=  2 
*  :=  5 
ELSE 

k  :=  1 
3  t=  3 
ENDIF 

d  :=  X  +  j  +  k 


l  n3  ) 


Figure  8.  Progran  Coaplexity  Exeaple 
(Harrison  and  others,  1982:76) 


Summary 

This  chapter  has  presented  examples  of  size,  structure,  and  composite 
metrics.  The  metrics  shown  are  representative  of  the  three  different  types  of 
metric. 

The  size  metrics  are  easy  to  calculate  from,  source  code,  but  measure 
complexity  by  considering  that  what  constructs  the  programs  are  developed  from 


are  Irrelevant,  only  the  number  of  these  constructs  Is  Important,  These  metrics 
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cannot  be  used  until  far  into  the  software  development  cycle,  as  they  need 
actual  source  code.  This  means  that  they  are  not  good  as  design  tools,  but 
they  can  show  which  procedures  will  have  the  greatest  number  of  changes  and 
errors  during  software  test  and  maintenance. 

Some  data  structure  metrics  can  be  used  earlier  in  the  design  cycle,  which 
can  give  early  feedback  to  the  quality  of  the  software.  The  structure  metrics 
are  better  able  to  test  the  structure  of  algorithm.s  and  data  structures,  which 
are  the  basic  framework  of  all  programs,  as  Nlklaus  Wirth  suggests  by  the  title 
of  his  classic  book  Algorithms  +  Data  Structures  =  Programs. 

Composite  metrics  are  apparently  not  in  common  use.  But  a  careful 
selection  of  different  types  of  metrics  that  can  complement  each  others’ 
weaknesses  can  give  a  software  engineer  an  insight  into  the  program  structure 
and  possible  problem  areas  that  no  single  metric  can. 
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III.  Metric  Selection  Criteria 


Introduction 

This  chapter  describes  a  set  of  guiding  properties  that  were  used  to 
evaluate  software  complexity  metrics.  These  properties  are  presented  as 
guidelines  to  determine  how  well  individual  metrics  measure  complexity,  and 
therefore  measure  maintainability.  Two  benefits  are  derived  from  comparing  the 
metrics  to  these  criteria;  the  metrics  that  more  completely  measure  complexity 
are  identified,  and  the  characteristics  each  metric  best  reflects  are  Indicated. 

The  utility  of  the  first  benefit  Is  obvious,  but  the  need  for  the  second 
requires  explanation.  If  no  single  metric  meets  all  criteria,  metrics  that 
complement  each  other  can  be  used  Instead.  Using  these  criteria  to  screen  the 
metrics  shows  how  each  metric  can  best  be  applied.  As  Kearney  says,  "the 
properties  of  a  metric  determine  the  ways  in  which  it  can  be  used"  (Kearney 
and  others,  1986:1046).  If,  for  example,  a  single  metric  that  measures  control 
flow  complexity  does  not  measure  data  flow  complexity,  it  can  be  combined  with 
another  metric  that  meets  the  data  flow  complexity  criteria  and  more  complete 
coverage  will  result. 

After  the  presentation  of  these  criteria,  a  comparison  shows  which  metrics 
meet  each  criterion.  The  metrics  that  best  fit  all  of  the  criteria  are  discussed. 
Following  that  section,  a  summary  of  the  selection  guidelines  and  the  metrics  Is 
presented. 

Criteria  for  the  Selection  of  a  Maintainability  Metric 

These  criteria  are  basic  guidelines  to  determine  how  well  a  metric 
measures  complexity.  These  guidelines  are  luosely  arranged  Into  three  overall 
groups.  The  first  five  criteria  are  generic  and  could  be  used  to  evaluate  other 


types  of  metrics,  such  as  productivity  metrics.  They  do  not  determine  how  well 
a  metric  reflects  complexity;  they  verify  that  the  metric  is  generally  applicable 
across  different  software.  Three  criteria  are  presented  to  rate  how  well  metrics 
measure  a  program's  control  flow  complexity.  The  last  four  criteria  indicate  a 
metric's  measurement  of  a  module’s  data  flow  complexity. 

These  guidelines  are  not  equally  important  in  the  measuring  of  a 
complexity  metric.  For  example,  the  two  criteria  "Ranking  Basic  Control 
Structures"  and  "Nesting  and  Compound  Conditions"  are  both  contained  in  the 
criterion  "Accurately  Reflect  Control  Flow".  They  are  explicitly  enumerated 
because  each  criterion  is  important,  but  neither  is  as  weighty  a  consideration  as 
overall  control  flow. 

Although  the  criteria  are  not  equally  important,  a  definitive  weighting  of 
the  criteria's  relative  importance  is  not  given,  except  for  the  implied 
subordination  within  the  control  flow  and  data  flow  complexity  sections. 
Research  to  date  does  not  suggest  any  obvious  ranking  of  criteria.  Therefore, 
the  criteria  are  arbitrarily  being  considered  equally.  With  this  constraint,  any 
metric  that  satisfies  more  criteria  than  another  will  be  judged  to  better  measure 
complexity. 

Clear  and  Unambiguous.  This  criteria  determiries  how  easily  the  metric 
can  be  evaluated  and  how  easily  the  re.sult  of  the  evaluation  can  be  compared 
to  results  of  other  evaluations.  As  Conte  expresses,  "does  the  metric  lead  to  a 
simple  result  that  is  easily  interpreted"  (Conte  and  others,  1986:22)?  The 
metric  should  be  clear  and  unambiguous  so  it  can  be  calculated  from  Just  the 
source  code  (Levitin,  1986:314). 

Lines  of  code  (LOC)  is  an  example  of  a  metric  that  is  not  clear  and 
unambiguous.  While  it  may  api)ear  to  be  very  easy  to  count,  many  researciters 
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have  different,  counting  strategies.  This  can  lead  to  different  LOG  values  from 
the  same  source  code. 

Halstead's  N  metric  Is  another  metric  that  appears  to  be  simple  to 
calculate,  but  presents  some  difficulty  in  Its  calculation.  Determination  of 
which  tokens  should  be  counted  as  operators  and  which  tokens  should  be 
counted  as  operands  is  not  always  clear.  Even  after  a  counting  strategy  has 
been  defined  and  adhered  to,  some  function  calls  defy  analysis  because  they  act 
as  both  operators  and  operands. 

Intuitive.  A  metric  should  be  Intuitively  appealing.  It  should  correspond 
to  a  user's  innate  perception  of  a  program's  complexity.  The  complexity  value 
determined  for  a  less  complex  module  should  be  less  than  that  of  an  obviously 
more  complex  module.  It  must  always  le  positive  and  additive  (Levitin, 
1986;314;  Jayaprakash  and  others,  1987:241). 

For  example.  If  two  distinct  pieces  of  code  are  combined,  the  complexity 
value  for  the  joined  code  should  be  greater  than  the  complexity  value  for  either 
piece.  An  optimum  solution  occurs  when  the  complexity  value  for  the  Joined 
code  equals  the  sum  of  the  complexity  values  for  the  separate  pieces. 

Language  Independent.  A  mecric  that  Is  based  on  a  single  language  Is  not 
generally  applicable.  A  metric  should  be  as  universally  applicable  as  possible 
so  that  It  can  be  used  to  evaluate  software  written  In  any  programming 
language  (Jayaprakash  and  others,  1987:241). 

A  metric  that  estimates  complexity  based  on  the  number  of  GOTO 
statements  in  a  program  may  be  a  valuable  quality  measurement  tool  when  used 
with  the  FORTRAN  language.  This  metric  would  be  worthless  when  used  with 
the  Prolog  language,  as  Prolog  does  not  have  any  GOTO  statements.  I'his  metric 
would  also  be  of  limited  utility  1'  measuilng  the  complexity  of  modules  written 
in  Pascal  or  Ada.  While  both  of  these  languages  allow  a  oOTO  statement,  such 
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use  is  heavily  discouraged.  Therefore,  the  likelihood  of  determining  a 
reasonable  value  for  a  Pascal  module's  complexity  using  this  metric  Is  negligible. 

The  FORTRAN  language  allows  only  a  single  statement  to  be  placed  on  a 
line.  This  simplifies  the  calculation  of  LOG.  Other  languages  such  as  JOVIAL, 
Pascal,  and  C  have  special  characters  that  delimit  the  end  of  a  statement.  With 
these  languages,  several  statements  can  be  placed  on  one  line.  This  can 
complicate  calculation  of  LOG  for  modules  written  in  these  languages.  For  this 
reason,  LOG  is  not  a  language  independent  metric. 

Prescriptive.  A  complexity  metric  .should  not  only  measure  the  software's 
complexity.  It  should  also  reveal  how  the  software  should  be  modified  to 
minimize  complexity  (Kearney  and  others,  1986:1047).  The  metric's  results 
should  direct  the  software's  maintainers  to  the  modules  that  need  to  be 
changed,  and  it  should  -eveal  to  the  maintainers  what  changes  need  to  be  made. 

If  a  module  has  a  large  value  for  information  flow  (INFO),  that  should 
suggest  to  a  developer  or  maintainer  that  the  module  needs  to  be  further 
decomposed  Into  more  manageable  modules,  and  the  inter-module  communications 
should  be  simplified.  A  possible  coraplicallo  i  is  if  a  software  developer  wants 
to  attain  the  smallest  value  for  INFO,  he  can  write  f  program  as  a  single 
module  with  no  interconnections  between  modules.  This  programming  practice 
would  Increase  the  complexity  of  the  progran,  not  decrease  It  as  the  INFO  value 
would  lead  us  to  believe. 

Robustness.  A  trivial  reordering  of  the  program's  statements  should  not 
lessen  the  complexity  th?  metric  reflects.  A  reduction  in  a  metric's 
measuremerit  should  result  from  an  Imp'-ovement  In  the  program  measured 
(Kearney  and  others,  1986  1047).  Adherence  to  this  criteria  forces  any 
programming  practice  that  reduces  the  metric  value  to  also  reduce  the  program's 
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complexity.  As  Conte  asked,  "is  the  metric  sensitive  to  the  artificial 
manipulation  cf  some  factors  that  do  not  affect  the  performance  of  the  software" 
(Conte  and  others,  1986:22)? 

As  an  example,  a  program  that  scored  high  on  McCabe's  cyclomatic 
complexity  measure  because  of  the  number  of  loops  could  be  rewritten  with  the 
loops  as  in-line  code.  This  would  lessen  the  cyclomatic  complexity  score,  but 
might  significantly  Increase  the  complexity  of  the  module. 

Accurately  Reflect  Control  Flow.  The  control  flow  in  a  program  is  the 
path  through  a  prog’-am  that  is  followed  during  execution.  By  measuring  the 
number  of  paths  through  a  module,  a  determination  can  be  made  if  the  module  is 
becoming  difficult  to  understand  and  should  be  partitioned  into  separate,  smaller 
modules  (McCabe,  1983:3).  A  satisfactory  control  flow  metric  should  measure 
how  easily  understandable  the  control  structures  are  in  a  program. 

Ranking  Basic  Control  Structures.  Structured  programming 
methodology  recognizes  three  basic  control  constructs:  sequential,  selection,  and 
repetition  constructs  (Prather,  1984:341).  The  sequential  statements  have  the 
lowest  control  complexity,  as  the  flow  of  control  is  always  to  the  immediately 
following  statement.  Selection  statements  can  branch  to  one  or  more  other 
state, ments.  In  a  selection  statement's  basic  form,  either  one  section  of  code  or 
anottier  its  executed.  Whichever  section  completes  execution,  control  flow 
continues  from  the  same  point.  A  repetition  statement  also  has  two  locations 
fr',;m  which  1'  can  continue  execution.  Either  control  goes  to  the  statement 
following  the  bottom  of  the  loop,  or  control  passes  back  to  the  top  of  the  loop 
and  the  statements  within  the  loop  are  executed  repeatedly. 

Jayaprakash  recommends  that  any  control  flow  metric  should  show  that 
sequential  statements  are  less  complex  than  single-selection  selection 
statements,  which  are  less  complex  than  repetition  statements  (Jayaprakash  and 


others,  1 987:24 1).  Repetition  statements  should  be  counted  as  more  complex 


than  selection  siatements  because  they  cause  backwards  branches  in  the  code, 
and  "it  is  well  known  that  these  (backwards  branches]  cause  the  most  difficulty 
in  practice"  (Prather,  1984:345). 

Nesting  a_nd  Compound  Conditions.  A  complexity  measure  should  be 
sensitive  to  nesting  in  branch  statements.  Several  researchers  have  stated  that 
one  module  that  contains  two  selection  statements  with  one  nested  inside  t’-ie 
other  is  more  complex  than  a  different  module  that  has  the  identical  two 
conditions  occurring  in  sequence  (Jayaprakash  and  others,  1987:242;  Myers. 
1977:62).  A  selection  statement  that  has  a  compound  condition  is  slightly  more 
complex  than  a  .selection  statement  with  only  a  single  condition;  this  added 
complexity  should  be  reflected  by  a  complexity  metric. 

Accurately  Reflect  Data  Flow.  Another  factor  that  impacts 
complexity  is  the  data  flow  into,  within,  and  out  of  the  module.  To  better 
understand  the  module's  complexity,  the  complexity  of  this  data  flow  should  be 
measured  in  addition  to  the  module's  control  flow  complexity.  According  to 
Harrison,  "another  factor  that  Influences  software  complexity  is  the  corflguration 
and  use  of  data  within  the  program.  Several  methods  can  be  used  to  measure 
complexity  by  the  way  program  data  are  used,  organized,  or  allocated"  (Harrison 
and  others,  1982:67). 

Indicates  Data  Amount.  A  basic  factor  that  determines  the 
complexity  of  the  data  flows  within  a  module  Is  the  amount  of  data  that  a 
malntalner  has  to  comprehend  A  large  number  of  variables  that  must  be 
understood  can  make  the  maintalner's  assignment  very  difficult.  These  Include 
the  number  of  variable  parameters  and  global  data  flowing  into  and  out  of  a 
rrH)du.ie,  and  tiie  variables  declared  and  used  locally  to  the  module. 
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Shows  Data  Use.  How  the  variables  are  actually  used  In  a  module 
Is  another  important  determiner  of  module  complexity.  Determining  which 
variable  is  modified  and  where  it  Is  modified  can  be  arduous  in  a  long  module. 
If  a  variable  is  used  and  modified  within  a  small  portion  of  a  module,  the 
variable  will  be  less  challenging  to  remember.  If,  conversely,  a  variable  is  set 
once  at  the  beginning  of  a  module  and  not  used  for  100  lines,  the  maintainer 
mav  have  a  problem  remembering  the  variable's  value. 

Reflects  lnter"Module  Data  Links.  The  coupling  of  the  module  is 
reflected  by  the  number  of  data  links  into  and  out  of  the  module.  Measuring 
the  data  links  is  important  because  "by  observing  the  patterns  of  communication 
among  the  system  components  we  are  in  a  position  to  define  measurements  for 
complexity,  module  coupling,  level  interactions,  and  stress  points"  (Henry  and 
Kafura,  1981:511). 

Comparison  of  Metrics  by  Selection  Criteria. 

Figure  9  shows  relationships  between  metrics  described  in  Chapter  Two 
and  the  selection  criteria  developed  previously.  This  is  presented  in  a  table  to 
better  show  the  relationships  between  size  metrics  and  general  complexity,  data 
structure  metrics  and  data  complexity,  control  structure  metrics  and  control 
complexity,  and  the  hybrid  metrics. 

The  metric  selection  criteria  are  shown  across  the  top  of  the  grid,  in  the 
same  order  as  they  were  described  previously  in  this  chapter.  They  are 
separated  into  three  groups  Just  as  their  descriptions  were.  The  metrics  are 
presented  In  the  order  they  were  described  in  Chapter  Two.  They  are  shown  in 
the  same  groups  that  were  established  in  Chapter  Two:  size  metrics,  data 
structure  metrics,  contiol  structure  metrics,  and  then  hybrid  metrics. 
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To  show  that  a  metric  satisfies  a  criterion,  a  mark  Is  placed  in  the  box 
at  the  intersection  of  the  criterion's  column  and  the  metric's  row.  No  mark 
Implies  that  the  metric  either  does  not  satisfy  the  criterion  at  all,  or  it  does  so 
poorly.  An  Indication  of  partial  agreement  (*)  reflects  that  the  metric  does  not 
fully  meet  the  criterion,  but  it  does  help  measure  it.  An  indication  of 
substantial  agreement  (!)  suggests  that  the  metric  fully  satisfies  the  criterion. 
Justification  for  the  indications  is  given  in  Appendix  A,  Justification  for  Metric 
Complexity  Criteria  Ratings. 

Figure  9  shows  that  size  measures  reflect  neither  control  structure 
complexity  nor  data  structure  complexity.  Neither  data  structure  metric  can 
measure  control  structure  complexity,  nor  can  any  control  structure  metric 
measure  data  structure  complexity.  Oviedo's  "C"  hybrid  metric,  which  measures 
both  control  structure  complexity  and  data  structure  complexity,  measures  the 
best  across  the  whole  spectrum.  If  a  single  metric  from  the  above  list  had  to 
be  chosen,  this  metric  fits  the  criteria  better  than  any  other. 

The  most  complete  control  flow  complexity  measure  is  MEBOW,  which  has 
more  agreement  indications  than  "C".  The  most  complete  data  flow  complexity 
measure  is  INFO,  which  has  more  substantlally-agree  indications  than  "C"  does 
within  the  data  flow  criteria.  A  combination  of  MEBOW  and  INFO  has 
substantially -agree  marks  in  eight  categories  and  partial-agree  marks  in  three 
of  the  other  four  criteria.  The  only  criteria  that  neither  MEBOW  nor  INFO  meet 
is  the  "Shows  Data  Use"  criteria.  Because  of  the  large  coverage  of  metric 
selection  criteria,  a  hybrid  metric  using  both  MEBOW  and  INFO  in  a  2~ 
dimensional  vector  is  suggested. 

A  strong  case  can  be  made  for  using  a  combination  of  the  "C"  metric  and 
INFO.  The  control  flow  complexity  would  be  measured  nearly  as  well,  and  the 
data  flow  complexity  would  be  measured  by  all  criteria.  But  the  "C"  metric 
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Figure  9.  Metrics  vs.  Metric  Selection  Criteria 
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lacks  empirical  evidence  to  show  it  actually  measures  complexity.  MEBOW  itself 
has  not  been  tested,  but  it  has  a  substantial  weight  of  evidence  supporting  the 
use  of  v(G),  which  is  a  component  of  MEBOW.  Knot  count  also  has  empirical 
evidence  that  it  measures  program  complexity  (Woodward  and  others,  1983:105), 
although  no  studies  have  shown  a  correlation  between  knot  count  and  program 
mainalnability .  Because  of  the  amount  of  evidence  supporting  MEBOW's 
components.  It  is  being  recommended  instead  of  "C. 

Summary 

The  purpose  of  this  chapter  was  to  define  a  set  of  criteria  that  would 
help  determine  which  metrics  are  more  useful  than  others.  These  criteria  were 
grouped  into  three  categories,  the  general  applicability  criteria,  the  control  flow 
complexity  criteria,  and  the  data  flow  complexity  criteria.  Each  of  these 
criteria  was  explained,  and  justifications  why  each  is  Important  were  given. 

This  list  of  metric  selection  guidelines  is  not  a  complete  set  of  possible 
properties  that  a  metric  might  have.  As  Kearney  stated,  "although  the 
preceding  list  of  properties  may  be  flawed,  it  is  essential  that  the  designers  and 
users  of  software  complexity  measures  recognize  that  the  properties  of  measures 
constrain  their  usefulness  and  applicability"  (Kearney  and  others,  1986:1048). 
Overall,  It  must  be  remembered  that  the  selection  of  a  metric  to  measure 
maintainability  and  complexity  was  the  desired  end  result.  The  guidelines  were 
chosen  with  that  in  mind. 

Once  a  set  of  criteria  for  determining  how  well  a  metric  measures 
complexity  was  defined,  they  were  used  to  gauge  the  metrics.  By  comparing 
each  metric  with  each  criteria,  the  metrics  that  had  the  most  complete  coverage 
In  each  criteria  group  were  uncovered.  A  simple  comparison  of  the  maximum 
number  of  criteria  successfully  matched  by  the  metrics  brought  the  candidate 
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metrics  down  to  two  pairs,  "C"  was  disqualified  because  of  its  lack  of  empirical 
support. 
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IYji  Maintainability  Metrics  Proposed  for  AFQTEC  Use 


Introduction 

In  the  previous  chapter,  two  metrics  were  selected  for  use  in  measuring 
rtiaintainabllity .  These  metrics  measure  different  aspects  of  software  complexity, 
so  a  combination  of  these  two  metrics  will  be  more  comprehensive  than  either. 
This  chapter  explains  these  metrics  in  greater  detail  than  they  were  discussed 
in  Chapter  Two. 

This  explanation  includes  data  that  researchers  have  obtained  during 
va'-ious  studies  Further  empirical  data  that  supports  the  use  of  a  hybrid 
metric  for  the  measurement  of  complexity  by  comparing  the  metric  value  to  a 
module's  error  count  is  given  in  Appendix  C,  Empirical  Support  for  Hybrid 
Metrics.  Further  discussion  of  the  theoretical  support  for  MEBOW  and 
information  flow  follows.  The  data  both  support  and  repudiate  the  use  of 
MEBOW.  in  the  form  of  cyclomatic  complexity,  and  information  flow.  Both  sides 
of  the  issue  are  presented,  and  justification  why  the  evidence  supports  the 
metrics'  use  more  than  the  evidence  against  the  metrics  precludes  their  use  is 
given. 

After  considering  the  empirical  support  for  hybrid  metrics  in  general,  and 
MEBOW  with  information  flow  in  particular,  metric  implementation  considerations 
are  presented.  The  problems  associated  with  parsing  source  code  for  various 
languages  will  be  considered.  Then  rules  for  calculating  the  metrics  will  be 
given.  An  example  of  this  calculation  is  given  in  Appendix  D,  Calculation  of 
Metric  Value  for  an  Ada  Procedure. 

The  next  discussion  centers  around  tnc  determination  of  a  threshold 
value,  and  the  validation  of  the  metrics.  A  method  using  test  cases  to 
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determine  threshold  ranges  is  given.  A  plan  to  determine  how  well  the  metrics 
measure  maintainability  is  presented  for  AFOTEC's  use. 

Proposed  Maintainability  Metrics 

As  the  last  chapter  explained,  the  two  metrics  MEBOW  and  information 
flow  (INFO)  were  selected.  MEBOW  met  as  many  control  flow  and  general 
complexity  criteria  as  any  other  metric  presented.  No  research  has  been  located 
that  provides  empirical  evidence  that  MEBOW  measures  either  complexity  or 
maintainability.  This  metric  was  proposed  in  1987  and  no  known  studies  have 
been  completed  to  determine  the  worth  of  this  metric.  Fortunately,  MEBOW's 
component  metrics,  cyclomatic  complexity  (v(G))  and  knot  count  (KNOT),  each 
have  empirical  support  as  complexity  measures.  The  authors  of  MEBOW 
analytically  argue  why  the  combination  of  v(G)  and  KNOT  is  a  better  measure  of 
complexity  than  either  metric  is  by  itself. 

INFO  met  as  many  data  flow  and  general  complexity  criteria  as  any  other 
metric  presented.  INFO  also  has  empirical  support  as  a  measure  of  complexity. 
Henry  and  Kafura  also  argue  that  INFO  would  describe  complexity  better  if  it 
were  combined  with  another  complexity  metric,  such  as  Halstead's  length  or  v(G) 
(Henry  and  Kafura,  1981:513), 

The  main  point  of  this  thesis  is  that  both  of  these  metrics  should  be 
calculated  for  each  module  of  a  program.  These  resulting  scores  should  be 
compared  to  determine  which  modules  are  more  complex  than  others,  and  the 
scores  should  be  compared  to  a  threshold  to  Judge  which  modules  are  difficult 
to  maintain.  This  evaluation  of  each  module  separately,  instead  of  the  program 
as  a  whole,  follows  the  guidelines  given  In  the  Vol.  3.  That  this  module-by- 
inodule  examination  is  useful  and  has  ample  support.  Henry  and  Kafura  state 
that  "the  complexity  of  a  module  is  defirted  to  be  the  sum  of  the  complexities  of 
the  procedures  within  the  module.  It  is  interesting  to  note  that  the  majority  of 
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a  module's  complexity  is  due  to  a  few  very  complex  procedures"  (Henry  and 
Kafura,  1981:514).  Basil!  and  Perricone  have  evidence  that  errors  are  usually 
confined  to  a  single  module,  so  maintenance  efforts  will  only  have  to  modify  a 
single  module.  They  assert: 

it  was  found  that  89  percent  of  the  errors  could  be  corrected  by 
changing  only  one  module.  This  is  a  good  argument  for  the 
modularity  of  the  software.  It  also  shows  that  there  is  not  a  large 
amount  of  interdependence  among  the  modules  with  respect  to  an 
error  (Basili  and  Perricone,  1984:45]. 

Justification  of  Metrics  Selected 

This  section  will  explain  why  the  MEBOW  and  INFO  metrics  should  be 
combined  into  a  hybrid  metric.  First,  a  discussion  of  why  a  hybrid  metric 
should  be  used  is  presented.  This  discussion  is  followed  by  evidence  that  shows 
how  hybrid  metrics  were  better  able  to  measure  complexity  and  maintainability 
than  other  metrics. 

The  following  section  expresses  arguments  given  to  explain  why  MEBOW 
will  be  better  than  either  v(G)  or  KNOT.  This  section  presents  empirical 
evidence  that  the  two  metrics  measure  complexity.  Research  that  supports  the 
use  of  INFO  is  examined  in  the  next  section. 

Hybrid  Metric  Benefits  arid  Detriments.  Chapter  Two  included  a  discussion 
about  what  a  hybrid  or  composite  metric  Is.  Some  evidence  was  presented  that 
supported  the  use  of  a  hybrid  metric.  The  suggestion  made  that  the  use  of 
metrics  from  different  metric  classes  together  can  provide  a  greater  Insight  into 
a  module's  complexity  was  a  main  topic  of  the  chapter. 

This  section  expands  upon  the  previous  discussion  of  hybrid  metrics. 
Four  studies  are  presented  that  conclude  that  hybrid  metrics  can  better  measure 
complexity  than  metrics  from  a  single  class.  These  studies  were  accomplished 
by  Kafura  and  Cannirig  (1985),  Harrison  and  Cook  (1987),  Rainamurthy  and  Melton 
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(1986).  and  Li  and  Cheung  (1987).  The  metrics  tested  and  the  conclusions 
drawn  from  the  studies  are  examined  in  this  section.  In  the  next  section,  data 
from  the  four  studies  is  presented. 

Kafura  and  Canning  prepared  a  study  of  three  production  software  systems 
written  in  FORTRAN,  These  software  systems  were  from  NASA  and  an  extensive 
database  of  development  information  was  kept  for  each  system  for  use  at  the 
Software  Engineering  Laboratory.  The  Software  Engineering  Laboratory  is  an 

organization  composed  of  three  members;  NASA/Goddard  Space  Flight  Center,  the 
University  of  Maryland,  and  Computer  Sciences  Corporation.  This  association 

monitors  the  deiails  of  software  development  for  later  examination.  The 
Information  that  Kafura  and  Canning  used  were  the  counts  of  component  errors 
and  component  coding  time  for  each  module  (Kafura  and  Canning,  1985:380). 
These  data  were  used  in  an  attempt  to  validate  the  use  of  ten  metrics. 

Tne  ten  metrics  that  were  used  to  analyze  the  software  systems  were 
placed  into  three  groups:  code  metrics,  structure  metrics,  and  hybrid  metrics. 
The  three  metrics  they  considered  to  be  code  metrics  were  LOC,  Halstead's  effort 
(E\  and  v(G).  Three  of  the  four  metrics  in  the  structure  metrics  category  were 
not  explained  in  Chapter  Two,  and  as  they  did  not  greatly  affect  the  study, 

they  will  not  be  considered  here.  The  fourth  structure  metric  used  was  INFO, 

The  hybrid  metrics  were  a  combination  of  LOC  and  three  of  the  structure 
metrics,  including  INFO-LOC. 

They  first  obtained  results  to  see  if  "significant  differences  in  software 
metric  values  are  related  to  corresponding  differences  in  errors  and/or  effort" 
(lbid-380).  These  results  showed  some  correlation  between  the  metric  values  and 
the  errors  and  coding  times.  Then  the  combined  coding  time  and  error  factors 
were  compared  to  the  metrics,  and  better  correlatlon.s  resulted.  This  prompted 
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the  researchers  to  assert  "the  observations  made  above  lead  us  to  conclude  that 


growth  in  the  metric  values  corresponds  to  increases  individually  in  error 
proneness  and  coding  time  requirements  and  that  this  trend  becomes  more 
sharply  defined  when  the  combination  of  error  and  coding  time  is  taken  into 
account.  This  is  both  a  validation  of  the  metrics  and  a  motivation  to  use 
multiple  resource  (error  and  coding  time  data)  variables  in  further  validations" 
(ibid:381 ). 

They  accomplished  the  measurement  of  combined  errors  and  coding  time  by 
separating  their  components  (modules)  into  categories  of  higher  and  lower  errors 
and  coding  time.  Those  components  that  were  high  in  both  error  count  and 
coding  time  were  termed  "difficult,"  and  those  with  low  error  counts  and  coding 
time  were  termed  "easy"  components.  With  these  categories,  they  explained  their 
higher  correlations  with  "it  is  important  to  consider  the  combination  of  these 
factors  because  ...  a  component  with  a  high  metric  value  may  result  in  few 
errors  because  a  large  amount  of  time  was  invested  in  the  coding  of  this 
component"  (lbid:38 1 ). 

These  "difficult"  components  were  then  separated  into  categories  by  order 
of  difficulty.  The.se  categories  were  those  components  that  were  one  or  more 
standard  deviations  above  the  mean  for  number  of  errors  and  coding  time.  An 
outlier  is  a  component  more  than  one  standard  deviation  above  the  mean,  and 
an  extreme  outlier  is  a  component  more  than  two  above.  The  ten  metrics  were 
used  to  determine  how  many  outlier  error  components  they  could  identify.  Their 
result.s  indicate  that  the  metric  that  best  Identified  the  outliers  and  extreme 
outliers  was  the  hybrid  metric  INFO-LOC.  This  led  to  their  conclusion  that 
"this  observation  is  slgrilfleant  because  It  supports  the  need  to  use  metrics  from 
all  classe.s,  contlrms  again  that  structure  and  code  metrics  are  measuring 
different  properties  of  software  components"  (lbid:383).  They  also  .•iiggested  the 
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use  or  a  "miniraum  r.ietric  count  of  4",  which  increases  the  number  of  outlier 
error  components  that  can  be  identified  (ibid:384). 

This  research  supports  the  use  of  a  hybrid  metric.  Some  issues  other 
than  the  use  of  a  hyoiid  metric  were  introduced.  Their  outliers  were  for  total 
error  and  total  coding  time.  Coding  time  has  not  been  presented  within  this 
thesis  as  a  factor  within  maintainability,  but  a  careful  consideration  of 
maintainability  would  support  its  use. 

Harrison  and  Cook  were  developing  a  metric  that  would  measure  an  entire 
software  system's  complexity.  They  considered  metrics  in  two  categories, 
macrolevel  measures  and  microlevel  measures.  The  macrolevel  measures  consider 
how  the  function  of  each  module  fits  into  the  overall  system.  The  microlevel 
measures  consider  the  detailed  operation  of  a  single  module.  They  described  a 
new  metric  that  would  embrace  both  macrolsvel  and  microlevel  complexity 
(Harrison  and  Cook,  1587:213). 

They  called  their  new  metric  MMC,  for  Macro/Micro  Complexity.  This 
metric  Includes  both  a  macrolevel  and  a  microlevel  component.  The  macrolevel 
measure  uses  the  number  of  global  and  parameter  variables  that  are  used  within 
a  module,  much  like  INFO  does.  It  also  Includes  a  calculation  of  "quality  of  the 
subprogram's  documentation"  (ibid:216).  which  Is  .simply  a  ratio  of  the  number  of 
comments  to  the  number  of  source  lines  within  each  module.  The  microlevel 
measure  they  used  was  v(G).  The  sum  of  the  microcomplexity  of  each  module 
and  the  amount  it  contributes  to  the  overall  complexity  through  its  use  of  data 
is  the  MMC. 

This  hybrid  measure  was  compared  to  six  other  metrics,  using  error  data 
from  a  30,000-llne  compiler  project  written  in  C.  MMC  had  a  higher  correlation 
with  the  number  of  errors  that  occurred  within  each  module  than  any  otlier 


metric  (ibid:2 1 7).  One  of  the  other  metrics  was  v(G).  Their  results  show  that 
adding  a  data  flow  component  to  v(G)  will  create  a  metric  that  can  bettt;r  detect 
modules  that  will  have  the-  greatest  number  of  errors.  This  also  supports  the 
use  of  a  hybrid  metric  to  measure  software  complexity. 

Tamamurthy  and  Melton  looked  at  Hansen's  ordered  pair  of  v(G)  and 
operator  coui\t  and  decided  that  a  combination  of  Halstead's  and  McCabe's 
metrics  would  make  a  good  metric.  They  combined  the  two  metrics  into  a  single 
metric  to  prevent  the  problem  that  occurs  when  the  two  metrics  give  conflicting 
reports  about  the  relative  complexities  of  two  modules.  They  weighted  the 
count  of  certain  operators  and  operands  by  the  level  of  nesting.  They  defined 
a  value  "C",  which  is  one  greater  than  the  current  structure's  level  of  nesting. 
They  "call  C  the  cyclomatic  complexity  of  the  control  structure"  (Rainamurthy 
and  Melton,  1986:310}.  In  addition  to  counting  all  of  the  operands  and 
operators,  they  counted  those  that  were  part  of  a  control  structure  and  added 
the  value  of  the  nesting  level  to  the  count  of  that  operator  or  operand.  This 
gave  certain  operators  greater  weight  than  others. 

To  test  ,.iese  weighted  metrics,  they  calculated  the  value  of  the 
unweighted  and  the  weighted  metrics  for  a  number  of  test  programs.  These  test 
programs  were  in  three  groups:  programs  with  the  same  Software  Science  values 
but  different  v(G),  programs  with  the  same  v(G)  but  different  Software  Science 
values,  and  a  general  collection  of  programs.  Their  results  showed  that  "the 
weighted  metrics  do  detect  the  complexities  which  the  software  science  metrics 
detect  and  the  complexities  which  the  cyclomatic  number  detects"  (ibid:313). 
These  results  also  support  the  use  of  a  hybrid  metric  from  two  different  classes 
of  metric. 

LI  and  theunr  compared  31  metrics  to  see  if  generalizations  could  be 
made  about  different  classes  of  metrics.  These  metrics  were  calculated  for  265 
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studeiit  assignments  in  FORTRAN.  No  atteriijit  was  made  to  compare  any  of  tiio 

metrics  to  any  ''external  valiiation,"  such  as  a  comparison  of  the  metrics  to  the 

numbers  of  errors  or  the  amount  of  time  required  to  code  each  assignment  (Li 

and  Cheung,  1  987;707).  This  study  was  just  to  consider  the  general 

relationships  among  the  different  metrics  and  determine  the  internal  consistency 

among  different  measures  within  the  same  metric  (Software  Science). 

They  considered  the  correlations  among  many  different  metrics,  within  the 

.-amc  classes  and  between  classes  One  of  their  conclusions  was  that  a  hybrid 

metric  would  be  very  useful.  They  said; 

In  general,  the  control  flow  metrics  fail  to  be  comprehensive  and  do 
not  consider  the  contribution  of  any  f.actor  except  control  flow 
complexity.  However,  these  metrics  can  differentiate  between  two 
programs  of  similar  VOLUME  metrics  and  certainly  are  related  to  the 
software  quality.  Hence,  a  useful  approach  is  to  use  VOLUME 
metrics  for  prior  classification  and  then  to  use  CONTROL 
ORGANIZATION  measures  to  evaluate  the  programs  In  detail  [Li  ana 
Cheung,  1987:7071. 

In  a  different  study.  Kafura  and  Reddy  came  to  the  same  conclusion. 
They  stated  their  study  "has  also  confirmed  the  results  obtained  in  pieviotis 
work  with  respect  to  the  distinction  between  the  code  and  structure  metrics. 
This  distinction  was  evident  in  that  the  maintenance  changes  to  components 
might  dramatically  alter  the  values  of  metrics  in  one  class  of  metrics  without 
changing  materially  the  values  of  metrics  in  the  other  class"  (Kafura  and  Reddy, 
1987;342), 


The  results  of  these  four  studies  suggest  that  the  use  of  a  hybrid  metric 
is  a  useful  technique  for  measuring  complexity.  Appendix  C,  Empirical  Support 
for  .Hybrid  Metrics,  presents  the  data  that  three  of  these  studies  generated  to 
support  their  conclusions, 

MEBOW.  MEBOW  was  designed  as  a  comprehensive  control  flow  complexity 
metric  that  did  not  have  the  deficiencies  of  other  popular  metrics,  No  empirical 
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evidence  or  examples  were  shown  to  support  that  this  proposed  metric  measured 
complexity  better  than  the  measures  it  was  supposed  to  supplant.  Instead,  this 
metric  was  shown  to  meet  twelve  "precisely-stated  intuitive  properties  expeced 
of  any  control  flow  complexity  metric"  (Jayaprakash  and  others,  1987:238). 
These  properties  included  such  factors  as  language  independency,  ranking  of 

basic  control  constructs,  and  sensitivity  to  nesting  wiiich  were  presented  in  the 
previous  chapter  as  metric  selection  criteria.  None  of  the  other  control  flow 
complexity  metrics  presented  (v(G),  KNOT,  and  SCOPE  ratio)  was  able  to  satisfy 
all  twelve  properties.  Jayaprakash,  Lakshmanan,  and  Sinha  explained  that  these 
properties  were  important  by  stating  "the  idea  is  that  if  a  control  flow 

complexity  metric  fails  to  satisfy  these  intuitive  pfoperties,  any  extent  cf 
empirical  evidence  supporting  its  use  in  estimating  the  maintenance  cost  of 
software,  or  predicting  the  number  of  errors  in  the  program,  etc.,  cannot  really 
provide  enough  confidence  for  its  widespread  use  in  practice"  (ibid:238). 

They  consider  v(G)  to  be  only  a  special  case  of  MEBOW,  where  each 
branch  is;  counted  as  one,  and  a  "constant  bias"  of  2  (for  2p)  is  added  to 
calculate  the  va.lue  (ibid:240).  KNOT  is  also  considered  as  a  special  case  of 
MEBOW,  where  kncts  are  given  a  weight  of  one,  and  all  other  control  flow 
factors  are  ignored.  Because  these  two  metrics  are  encompassed  by  MEBOW,  they 
believe  that  MEBOV/  will  better  measure  complexity  than  either  metric  could 
alone.  They  state,  "it  appears,  therefore,  that  by  suitably  assigning  the 
relative  weights  to  the  factors  staled  above,  it  Is  possible  to  arrive  at  a 

complexity  metric  which  combines  the  strengths  of  the  existing  measures" 

(i,hid:240). 

After  using  Jayaprakash.  Lakshmanan,  and  Sinha's  arguments  that  MEBOW 
is  better  than  either  v(G)  or  KNOT,  some  evidence  of  how  well  these  two  metrics 
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iru'asurc'  complexity  is  giveii  for  compariso;i.  Many  studies  have  attempted  to 
measure  how  well  v(G)  measures  complexity.  Some  of  the  conclusions  and 
re, suits  ol  these  studies  are  presented.  KNOT  has  not  been  studied  in  as  much 
detail,  but  some  results  for  KNOT  are  shown. 

Evidence  Supporting  the  Use  of  v ( G ) .  In  a  pair  of  experiments, 
Curtis  attempted  to  measure  the  psychological  complexity  of  software 
maintenance  tasks  by  comparing  Halstead's  effort  (E)  and  v(G)  to  the  actual 
performance  of  programmers  on  two  software  maintenance  tasks.  The 
programmer's  performance  was  measured  in  two  ways.  The  first  was  based  on 
the  premise  that  a  good  measure  of  a  programmer's  understanding  of  a  program 
is  his  ability  to  learn  its  function  and  reproduce  an  equivalent  program  without 
notes.  This  performance  was  measured  by  the  "functional  correctness  of  each 
separately  reconstructed  statement"  (Curtis,  and  others,  1980a:297).  The  second 
performance  criterion  was  measured  by  how  correctly  a  requested  change  was 
implemented  and  che  time  to  perform  the  modificatiort.  Measurements  cf  the 
accuracy  of  implementation  and  time  to  completion  were  correlated  between 
modules  and  their  E  and  v(G)  values.  The  correlations  were  not  high,  but  the 
study's  conclusions  were  "tl\e  two  experiments  comprising  this  study  produced 
empirical  evidence  that  software  complexity  metrics  were  related  to  difficulty 
programmers  experienced  in  understanding  and  modifying  software"  (lbid:301). 
These  conclusions  were  questionable,  as  tiie  correlations  for  v(G)  ranged  from 
-.55  to  -.21  in  the  first  experiment,  and  from  .38  to  -.36  in  the  second 
experiment. 

In  a  later  experiment,  Curtis  found  v(G)  to  be  a  better  predictor  of 
programmer  perfornianco.  This  experiment  measured  how  long  each  of  54 
professional  programmers  took  to  find  and  correct  a  single  error  in  three 
separate  FORTRAN  programs.  The  correlations  between  v(G)  and  the  average 
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performance  were  given  t,o  be  .S3  for  single  subroutines,  and  .65  for  total 
programs.  These  correlations  were  much  stronger  than  in  the  previous 
e.Kperiments.  Tiiese  results  "demonstrated  that  far  stronger  results  could  be 
obtained  when  the  limitations  in  our  earlier  experimental  procedures  were 
overcome.  For  Instance,  our  previous  research  was  conducted  exclusively  on 
small-sized  (35-55  lines  of  code)  programs,  which  seems  to  have  limited  those 
results..."  (Curtis  and  others,  19805:307). 

McCabe  stated  that  a  v(G)  of  ten  is  a  reasonable  upper  limit  for  a  single 
procedure,  and  if  complexity  exceeds  ten.  the  procedure  should  be  decomposed 
into  smaller  procedures.  Walsh  studied  software  developed  for  the  AEGIS  Naval 
Weapon  System  radar  to  determine  if  procedures  with  higher  v(G)  had  a  higher 
number  of  errors.  He  quickly  determined  a  correlation  between  those  procedures 
with  a  high  v(G)  and  the  occurrence  of  errors  in  those  procedures.  But  he  saw 
that  those  were  also  the  largest  procedures  in  terms  of  lines  of  code.  To 
determine  if  the  number  of  decisions  within  a  procedure  had  an  impact,  and  that 
v(G)  was  not  just  measuring  size,  Walsh  separated  those  modules  with  v(G)  of 
ten  or  more  and  those  with  a  lower  v(G)  to  compare  their  relative  error  count. 
He  found  that  the  procedures  with  v(G)  of  ten  or  more  had  21  percent  more 
errors  per  100  lines  of  code  than  those  with  a  smaller  v(G).  Those  procedures 
with  higher  v(G)  averaged  5.60  errors  per  100  source  statements,  while  the 
others  averaged  4.59  errors  (Walsh.  1983:95).  This  suggests  that  v(G)  is 
measuring  something  more  than  Just  the  size  of  a  procedure,  and  that  it  is 
valuable  in  predicting  the  error  rate  for  software.  Walsh  explained  his  numbers 
by  stating: 

As  tha  number  of  detected  errors  in  a  piece  of  software  increases, 
the  probability  of  the  existence  of  more  undetected  errors  also 
Increases.  Put  simply,  errors  come  in  clusters.  Thus,  It  can  be 
confidently  predicted  that  when  the  procedures  in  the  study  enter 
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Ihe  maintenance  phase  of  their  existence,  the  procedures  with  a 
complexity  tjreater  than  or  equal  to  ten  will  continue  to  experience 
higher  error  rates  than  those  procedures  with  complexity  below  ten 
[Walsh.  198d;95-96|. 

Harrison  and  Cook's  macrocomplexity  and  microcomplexity  metrics  were 
described  earlier.  Their  correlations  for  v(G)  with  error  occurrence  and  other 
metrics  are  presented  in  Figure  15  within  Appendix  C.  They  also  compared  how- 
well  each  metric  was  able  to  Identify  the  most  error  prone  and  least  error  prone 
modules.  The  modules  were  listed  from  most  errors  discovered  to  least,  and  a 
com.parison  was  made  to  how  well  the  metrics  were  able  to  rank  order  the 
twenty  modules.  The  correlation  for  how  well  v(G)'s  ranking  of  all  the  modules 
matched  their  actual  error  ranking  was  only  .50.  but  a  ranking  of  just  the  most 
error  prone  six  and  least  error  prone  six  modules  was  .81.  Both  numbers  were 
the  middle  scores  for  the  seven  metrics  measured.  A  conclusion  that  Harrison 
and  Cook  drew  from  that  data  is  "this  suggests  that  the  metrics  work  quite  well 
in  identifying  the  'extraordinary  cases.'  but  do  a  relatively  poor  job  of 
distinguishing  among  modules  which  do  not  fit  into  one  of  these  'extraordinary' 
categories  (i.e..  either  few  errors  or  many)"  (Harrison  and  Cook,  1987:218).  If  a 
software  manager  had  this  type  of  error  prediction  data  from  v(G),  he  could 
spend  more  resources  testing  the  more  complex  modules,  reducing  resources  on 
those  modules  that  are  determined  to  be  less  complex. 

Shepperd  studied  v(G)  and  the  results  of  other  research  and  came  to  a 
different  conclusion  about  the  metric  than  other  researchers.  He  felt  the  metric 
is  based  on  "poor  theoretical  grounds  and  an  inadequate  model  of  software 
development"  (Shepperd,  1980:30).  He  disputes  the  metric's  empirical  validation 
results,  also.  He  disagrees  that  an  Intuitive  appeal  should  be  part  of  a  metric's 
validation  and  sneeringly  discards  Intuition  as  a  factor  in  the  consideration  of 
the  metric.  While  It  may  be  understood  that  a  metric's  intuitive  appeal  should 
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not  be  ils  only  justification  for  use,  this  factor  should  not  be  offhandedly 
dismissed. 

A  list  of  theoretical  objections  was  presented  by  Shepperd.  One  is  that 
"the  treatment  of  case  statements  has  also  been  subject  to  disagreement" 
(ibid:32).  He  discusses  that  different  researchers  have  used  different  counting 
strategies  tc  number  the  decisions  in  a  case  statement.  This  is  one  reason  that 
a  counting  strategy  should  be  rigorously  defined  and  adhered  to.  Shepperd 
points  out  that  v(G)  cannot  measure  the  complexity  of  sequential  statements. 
This  is  certainly  a  valid  criticism  and  is  why  others  have  added  a  size 
complexity  metric  to  v(G). 

According  to  some  researchers  mentioned  by  Shepperd,  "applying  generally 
accepted  techniques  to  improve  program  structure”  can  actually  increase  v(G) 
(ibid:32).  This  is  because  the  metric  is  insensitive  to  the  unstructuredness  of  a 
program,  as  It  only  counts  decisions  and  does  not  reflect  if  a  decision  causes 
jumping  out  of  loops  or  into  another  decision,  which  are  generally  considered  to 
be  unstructured  techniques.  This  is  a  reason  why  KNOT  should  be  added  to 
v(G),  so  that  less  structured  techniques  will  cause  higher  complexity  values. 
This  is  one  of  the  justifications  for  MEBOW.  As  was  mentioned  in  Chapter  Two, 
Shepperd  believes  that  v(G)  does  not  measure  inter-module  complexity  well. 
Since  this  appears  to  be  a  correct  appraisal,  this  criticism  is  one  reason  why 
MEBOW  calculation  is  being  suggested  for  intra-module  use  only. 

Shepperd  refers  to  some  data  that  Evangelist  reported  showing  "the 
application  of  only  2  out  of  26  of  Kernighan  and  PI  auger's  rules  of  good 
programming  style  invariably  results  in  a  decrease  in  cyclomatic  complexity" 
This  contradicts  Myers’  results  comparing  the  v(G)  of  more  and  less 
structured  code  from  Kernlghar.  and  Plauger's  The  Elements  of  Programming  Style. 
According  to  Myers'  calculations,  v(G)  was  always  low'er  for  what  was 
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subjectively  considered  the  more  structured  code  from  their  examples  (Myers, 
1977;64i. 

Shepperd  reasoned  that  even  though  he  did  not  agree  with  the  theoretical 
justifications  for  v(G),  it  would  still  be  a  useful  metric  if  it  can  be  shown  to 
accurately  measure  complexity.  "The  theoretical  objections  to  the  metric,  that  it 
ignores  other  aspects  of  software  such  as  data  and  functional  complexity,  are 
not  necessarily  fatal.  It  is  easy  to  construct  certain  pathological  examples,  but 
this  need  not  invalidate  the  metric  if  it  is  possible  to  demonstrate  that  in 
practice  it  provides  a  useful  engineering  predictor  of  factors  that  are  associated 
with  complexity"  (ibld:33).  He  then  proceeded  to  condemn  other  researchers' 
experimental  metnods,  statistical  correlation  techniques,  and  results,  ending  with 
a  sweeping  statement  that  "to  summarize,  many  of  the  empirical  validations  of 
McCabe's  metric  need  to  be  interpreted  with  caution"  (ibid:34).  His  points  were 
well  made,  but  he  did  not  refute  all  of  the  results  he  presented  that  suggest 
that  v(G)  could  determine  error-prone  modules.  His  contention  that  a  problem 
with  this  metric  is  that  it  does  not  measure  data  flow  complexity  is  a  further 
argument  that  v(G)  should  be  combined  with  a  data  flow  complexity  metric  such 
as  INFO. 

Evidence  Supporting  the  Use  ql  KNOT.  Woodward,  Hennel,  and 
i’edley  did  not  show  any  data  that  compared  KNOT  to  any  error  data  or  the  time 
it  took  programmers  to  modify  a  module.  Instead,  they  showed  examples  of  two 
code  fragments  that  performed  the  same  function  and  stated  the  source  code 
with  the  smaller  KNOT  count  was  more  structured.  These  same  examples  showed 
that  v(G)  did  not  change.  While  the  source  code  with  fewer  knots  was  typically 
shorter  and  had  fewer  branches  than  the  example  with  more  knots,  one  example 


they  showed  had  a  lower  KNOT  count  for  ui  structured  code,  while  the  structured 
version  had  a  higher  KNOT  count  (Ho'vatt,  r.'88;. 

One  difference  between  KNOl'  count  and  v(G)  is  that  the  KNOT  count  does 
not  measure  the  structure  of  the  program's  control  flow.  Instead,  it  measures 
the  structure  of  tiit  source  code.  This  can  be  considered  a  benefit  because  a 
program's  maintainer  wili  work  with  the  .source  code  and  not  a  flow  graph 
(Howatt,  1988). 

Considering  that  a  more  structured  version  of  a  progreic  is  bette.-  tiian  a 
less  structured  version,  adding  KNOT  to  v(Gi  seems  to  be  a  better  way  to 
measure  a  program's  structuredness  than  v(G)  sione,  in  addition  to  determining 
how  difficult  it  will  be  to  test.  Figure  10  shows  a  more  structured  version  of 
the  code  fragment  from  Figure  7  (in  Chapter  Tw'o).  These  code  fragments  .are 
identical  in  function,  but  this  fragment  is  more  structured  than  the  other.  The 
v(G)  of  both  is  three,  but  the  second  version  has  only  three  knots  and  has  no 
backwards  jumps.  Discussing  the  benefits  of  KNOT,  they  state  "we  feel  that  the 
knot  count  provides  a  much  clearer  Indication  of  program  readability...  The  high 
knot  counts  for  these  (two  other)  routines  confirm  not  only  the  visual 
impression  of  high  complexity  but  aiso  the  difficulty  actually  encountered  in 
translating  them  to  other  languages"  (Woodward  and  others,  1983:105). 

In  Li  and  Cheung's  comparison  with  17  other  metrics,  they  assert  v(G) 
"correlates  well  with  Halstead's,  Gilb's,  Knot  Counts,  SCOPE,  EDGES,  and  NODES 
metrics.  So.  the  cyclomatic  complexity  metric  seems  to  bridge  the  gap  between 
the  two  categories:  VOLUME  and  CONTROL  ORGANIZATION  metrics"  (Li  and  Cheung, 
1987:705).  Their  data  shows  v(G)  correlations  in  the  range  of  .971  to  .796 
with  the  17  other  metrics  (ibid:704).  Knot  count  has  correlations  in  the 
range  of  .948  to  .799  with  the  same  metrics.  These  correlations  suggest  that 
these  two  metrics  measure  complexity  as  well  as  any  other  established  metrics. 
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CALL  TPR 

IF  (ZR)  500,  500,  100 
CALL  TED 

IF  (Z3)  200  200  550 
ZG  ZG  +  1 
ZC  =  0 
CALL  TCO 
GOTO  600 
Z3  =  1 
CALL  TEC 
ZB  =  ZB  +  1 
ZC  =  ZC  +  1 
CALL  TRA 
RETURN 
END 

^  =  KNOT 


Figure  10.  Mnre  Structured  Knot  Example 
{Woodward  and  others,  1983:104) 


In  this  section,  much  conflicting  data  and  data  interpretations  have  been 
presented.  While  the  use  of  McCabe's  cycloraatic  complexity  metric  is  now 
popular,  tile  metric  has  some  obvious  limitations.  It  appears  that  many  of  the 
theoretical  objections  that  Shepperd  has  against  v(G),  such  as  it  not  being  able 
t(.  measure  Jtrucluredness  or  data  flow,  would  be  remedied  by  adding  KNOT 
count  with  the  use  of  ME60W,  and  using  INI’O  to  measure  data  fiow.  A  well- 
dofin^a  counting  strategy  will  lessen  the  problem  of  researchers  measuring  the 
same  modules  differently  because  they  are  not  counting  MEBOW  in  the  same 
fashion  Overall,  it  appears  that  v(G)  lays  a  good  foundation  for  the 
measurement  of  control  complexity,  and  MEBOW  improves  upon  this  foundation, 
in  conclusion,  "several  variations  (for  measuring)  the  cyolomatlc  complexity 
metric  have  shown  ve.’-y  encoui  aging  potential  for  usefulriess  as  measures  of 
software  product  Quality"  (Basil!  and  Reiter,  1980:287). 


Information  Flow.  Information  flow  was  described  in  Chapter  Two,  and  an 
example  was  given  for  INF'O  calcu.iatlon.  This  section  presents  further  evidence 
that  INFO  reliably  exhibits  complexity,  in  the  form  of  data  flow  complexity. 
Basiii  and  Perricone  explain  why  this  data  complexity  is  an  important  factor  to 
measure  when  they  state  "interfaces  appear  to  be  the  major  problem,  regardless 
of  the  module  type"  when  referring  to  where  errors  occur  most  often  in  a 
program  (Basiii  and  Perricone,  1984:47). 

One  of  the  earliest  experiments  with  INFO  was  done  by  Henry,  Kafura,  and 
Harris  (Henry  and  others,  1983:125).  Values  for  INFO,  v(G),  and  Halstead's  E 
were  calculated  for  source  code  modules  from  the  Unix  operating  system  and 
were  then  compared  to  a  list  of  errors  found  during  the  system's  development. 
The  three  metrics  were  also  compared  against  each  other  to  see  if  they  appeared 
to  measure  the  same  factors.  Their  results  are  summarized  in  Figure  11.  The 
formula  used  to  calculate  INFO  is  shown  as  (5)  in  Chapter  Two. 


E 

v{C) 

INFO 

Errors 

.89 

.96 

.95 

E 

.8411 

.3830 

V  (G) 

.3459 

Figure  11.  A  Comparison  of  Three  Metrics 
(Henry  and  others,  1983:130) 


These  results  suggest  two  conclusions.  The  first  is  that  INFO  is  a  useful 
measure  of  complexity,  as  a  high  correlation  was  found  with  detected  errors. 
The  second  is  that  INFO  measures  different  factors  than  Halstead's  E  and  v(G), 


as  the  correlations  between  INFO  and  the  other  two  metrics  were  small.  Henry 
explained  this  result  as  "the  information  flow  complexity  measurement  is 
orthogonal  to  the  other  two  metrics  since  it  has  a  low  correlation  to  both 
Halstead's  and  McCabe's  metrics.  The  independence  of  the  information  flow 
metric  is  explained  by  its  greater  concentration  on  the  manner  on  which  system 
components  are  interconnected"  (ibid:130). 

Harrison  and  Cook's  comparison  of  INFO  (they  called  it  HNK)  with  other 
metrics  and  errors  appear  in  Figure  in  Appendix  C.  The  correlation  between 
INFO  and  errors  is  not  as  high  as  F  enry's  results  were,  with  a  .62  correlation. 
An  obvious  cause  of  this  discrepancy  is  that  they  did  not  have  access  to  all  of 
the  information  normally  used  to  calculate  IhFO.  Instead,  they  used  the 
formula: 

(fan-in  +  fan-out)  2  *  lengt  >  (11) 

Apparently,  summing  the  data  factors  instead  of  using  their  product  weakened 
their  influence  in  the  metric  calculation.  Therefore,  a  greater  correlation  with 
E  and  v(G)  were  encountered  tha  i  before,  but  a  worse  correlation  with  the  error 
count  was  the  result. 

INFO,  like  v(G),  was  used  in  an  attempt  to  rank  order  tne  most  and  least 
error-prone  modules  used  in  Harrison  and  Cook's  study.  INFO  had  a  .55 
correlation  with  the  ranking  of  all  twenty  modules  used.  This  is  slightly 
greater  than  the  .50  correlation  received  by  v(G).  Measuring  Just  the  most 
error-prone  six  module, 3  and  least  error-prone  six  modules,  INFO  had  a  .77 
correlation,  which  is  somewhat  smaller  than  the  v(G)  result  of  .81  (Harrison  and 
Cook.  1987:218). 

Recalling  Katura  and  Canning's  work  with  INFO,  Figure  14  in  Appendix  C 
shows  that  INFO  was  able  to  identify  extreme  outlier  error  components  better 
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than  any  non-hybrid  metric  used.  INFO  aJso  measured  the  number  of  resource 

outliers  better  than  any  other  metric  except  LOG.  INFO  correctly  identified 

38/85  error  and  coding  time  outliers  (Kafura  and  Canning,  1985:383). 

Rodriguez  and  Tsai  used  four  metrics  to  determine  the  complexity  of  two 

medium-sized  "system  Implementation  packages"  (Rodriguez  and  Tsai,  1986:369). 

The  metrics  used  were  INFO,  LOG,  v(G),  and  Halstead's  volume  metric,  which  is 

defined  as  (Conte  and  others,  1986:42): 

V  =  N  ■  iog2ri  (12) 

Just  as  with  the  Harrison  and  Cook  experiment,  INFO  was  not  calculated 

as  Henry,  Kafura,  and  Harris  suggest  it  should.  Rodriguez  and  Tsai  explain,  "as 

a  result  of  our  approach,  the  definition  of  fan-in  and  fan-out  given  by  Henry 

and  Kafura  has  to  be  revised"  (Rodriguez  and  Tsai,  1986:370).  They  show  that 

if  global  variables  are  modified  within  a  local  procedure,  they  are  not  counted 

in  the  data  flow  of  the  overall  procedure. 

Using  this  (Henry  and  others’)  formula,  no  good  correlations  of  the 
metrics  are  found  against  modifications  and  errors.  However,  using 
Halstead's  ideas,  an  adaptation  leads  us  to  formulate  the  complexity 
of  a  procedure  as; 

length  *  ln(fan-tn  *  fan-out) 

Using  this  definition  of  complexity,  the  correlations  found  are 
improved  considerably  ilbi_d;370|. 

I'hese  lour  iiieUiv  t.  were  compared  to  the  number  of  modifications  reported 
for  eacti  module  throughout  the  development  and  maintenance  of  the  two 
programs.  The.se  modifications  were  adaptive  and  perfective,  rather  than 
correcti'.M;  In  nature.  Rather  than  calculating  If  each  metric  can  Identify  the 
most-modified  modules,  the  study  showed  how  the  metrics  added  together 
explained  the  variation  in  the  number  of  modifications.  Their  results  showed 
that  INi-'O  explained  80.297%  of  the  variation  in  modifications,  while  INFu  with 
LOG  could  explain  84.994%  of  the  variation  With  .all  foui  metrics  combined, 
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87.257%  of  the  variation  of  modification  could  be  explained.  Thei’  concluded 
"we  have  to  keep  in  mind  that  the  high  regression  coefficient  (0.87257)  shows 
that  meaningful  relationships  exist  between  each  metric  taken  individually  or 
jointly  and  the  index  of  errors  of  modifications  to  the  software"  (lbld:371). 

Further  analysis  wa.s  performed  to  see  if  they  could  determine  some 
module  size  threshold  that  the  four  metrics  would  not  correlate  well  with  errors 
or  modifications.  Their  final  conclusions  were  "all  four  metrics  are  useful 
indicators  of  the  occurrence  of  errors  or  future  modifications  of  software  units, 
when  the  unit  size  exceeds  some  threshold.  For  our  study  cases,  that  threshold 
is  75  lines  of  code"  (ibid:374). 

Kitchenham  performed  a  s^udy  that  compares  v(G),  LOG,  and  an  INFO-like 

metric  called  Information  Linkage  (ID  with  the  226  modules  of  a  communications 

program.  As  INFO  does,  IL  considers  the  number  of  data  flows  into  and  out  of  a 

procedure.  The  number  of  procedures  that  call  the  current  procedure  are  added. 

as  is  the  number  of  procedures  the  current  procedure  calls.  These  factors  are 

added,  instead  of  multiplied  as  INFO  does.  Kitchenham  compares  the  three 

metrics  with  the  number  of  perfective  changes  and  the  number  of  corrective 

changes  made  to  the  communications  system.  The  percentage  of  error-prone  and 

change-prone  procedures  that  IL  identified  was  lower  than  the  percentages 

identified  by  the  LOG  and  v(Cf)  metrics.  She  suggests  that  these  iesults  'directly 

contradict  Kafura  and  Canning's  results,  although  t  ley  calculated  iNFO  much 

differently  than  she  did  (Kitchenham,  1988:374). 

Although  those  results  do  not  support  the  use  of  INFO  to  the  extent  that 

other  8ludle.s  do.  some  interesting  conclusions  were  given. 

The  results  of  this  study  suggest  that  it  might  be  a  rost-effective 
procedure  to  apply  more  stringent  development  ps  ocedures  to 
programs  with  high  fan-out  values.  Extra  time  spent  on  1.'’%  of  the 
programs  would  have  been  63%  efficient  (since  63%  of  the  programs 
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identified  warranted  additional  development  time),  but  would  have 
only  been  24%  effective  (since  only  24%  of  the  programs  which 
warranted  additional  development  time  would  have  been  identified). 

Extra  time  spent  on  a  randomly  selected  13%  of  programs  would 
have  been  33%  efficient  and  13%  effective  [Kitchenham,  1988;375]. 

^tjdc  Implementation  Considerations 

The  previous  section  explained  in  some  detail  why  hybrid  metrics  are 
useful,  and  presented  the  results  of  studies  that  show  how  hybrid  metrics  can 
measure  complexity  well  Then  MEBOW  was  described  and  evidence  supporting 
the  use  of  v(G)  and  KNOT  was  given.  Finally,  study  results  favoring  INF'O's  use 
was  presented.  This  section  explains  some  issues  of  metric  implementation  such 
as  counting  strategies,  and  determining  threshold  values. 

Calculation  ^  Metric  Value.  Figure  12  shows  a  sample  FORTRAN  program 
that  reads  three  numbers  and  writes  the  greatest  of  the  three  numbers.  Basic 
blocks,  which  are  the  straight  line  segments  of  code,  are  numbered  and 
separated  by  lines  within  the  figure.  A  flowgraph  of  the  program  is  presented 
next  to  the  program.  Following  Jayap  ■’xash's  terminology,  blocks  that  have 
only  one  source  statement  are  represented  in  the  flowgraph  as  horizontal  lines, 
and  blocks  that  contain  more  than  one  statement  are  represented  as  circles. 
The  implicit  branches  are  dotted  lines,  and  explicit  branches  are  solid  lines. 

According  to  the  definition  In  Chapter  Two,  MEBOW  is  calculated  by 
counting  branches  and  KNOTS  and  adding  their  respective  v/eights.  The  raw 
weights  of  the  branch  types  are; 

1.  implicit  forward  branch  =  1 

2.  Implicit  backward  branch  =  3 

3.  Explicit  forward  branch  =  2 

4.  Explicit  backward  branch  =  6 

These  values  represent  the  MEBOW  developer's  contention  that  an  explicit 
branch  is  twice  as  harmful  as  an  implicit  branch,  and  a  backwards  branch  is 
three  times  as  harmful  as  a  forward  branch.  According  to  the  developers,  "it  is 
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Important  to  note  that  the  weights  associated  with  each  of  those  entities  are 
only  relative  to  each  other  and  their  actual  values  are  of  no  significance" 
(ibid.240-24 1 ).  Ttiey  also  stated  that  "it  also  seems  meaningful  to  assign  a 
relatively  high  weight  to  each  knot  since  it  iiormally  represents  an 
unstructuredness  in  the  program"  (l^id;241.).  because  each  KNOT  involves  two 
branches,  the  weight  of  the  KNOT  is  calculated  as  the  sum  of  the  branches' 
weights.  If  two  Implicit  branches  intersect,  as  in  an  IF, ..THEN. ..ELSE  statement, 
it  is  not  considered  a  KNOT  by  MEBOW, 

To  each  branch  is  added  its  scope  weig.iit.  According  to  tyaprakash,  this 
"provides  a  means  for  recognizing  branches  with  remote  targets  which  when 
suitably  accounted  for,  can  help  the  complexity  metric  satisfy  properties  relating 
to  nesting"  (ibid:?39).  This  scope  weight  is  the  weight  of  the  subparagraph 
that  is  branched  around.  For  a  forward  branch  from  block  A  to  block  B,  the 
scope  is  the  subgraph  from  node  A  +  1  to  node  B  -  1.  This  represents  the 
nodes  between  A  and  B  and  the  branches  "whose  botli  end  points  are  within  the 
same  set  of  nodes"  (ibid:239).  For  a  backwards  branch  from  B  to  A,  any 
branches  that  include  the  nodes  A  or  B  are  also  counted.  An  example  from 
Figure  12  with  a  forward  branch  is  the  branch  from  node  4  to  node  10, 
represented  as  (4,10).  The  scope  of  the  branch  encompasses  the  four  branches 
(6,7),  (6,8),  (7,9),  and  (8,5',  and  the  two  KNOTS  1(6,3),  (7,9)|  and  1(7,9),  (8,5)). 
Therefore,  the  weight  of  <4,10)  is  equal  to  its  raw  weight  plus  the  weight  of  its 
interior  branches  and  KNOTS. 

Figure  12  shows  a  prograra  with  eleven  blocks,  thirteen  branches,  and  ten 
KNOTS.  The  MEBOW  calculation  for  Figure  12  is  as  follows: 
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100 

INTEGER  A,B,C 

READ  100,  A,B,C 
FORMAT  (314) 

IF  (A  .GT.  D) 

GO  TO  10 

IF  (B  ,GT.  C) 

GO  TO  20 

40 

200 

PRINT  200, C 

FORMAT  (15) 

GCTO  50 

J.0 

IF  {h  -GT.  C) 

GO  TO  30 

GO  TO  40 

30 

PRINT  200 ..A 

GO  TO  50 

20 

PRINT  200, B 

50 

STOP 

END 

Figure  12.  Example  MEBOW  Calculation 
(Jayaprakash  and  others,  1987:239) 


Branches : 


(1,2) 

=  1 

(implicit 

forwards 

branch) 

(1,3) 

“  1 

(implicit 

forwards 

branch) 

(2,6) 

=  4 

(explicit  forwards 
scope  covers  (3,4)  i 

branch  = 
and  (3,5) 

(3,4) 

=  1 

(implicit 

forwards 

branch) 

(3,5) 

=  1 

(implicit 

forwards 

branch) 
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(4,10) 


(5,11) 


(6.7) 

(6.8) 
(7,9) 
(8,5) 

(9.11) 

(10.11) 


27  (explicit  forwards  branch  =  2, 

scope  covers  (6,7),  (6,8),  (7,9),  (8,5)  == 
KNOTS  t(6,8),  (7.9)]  and  [(7,9),  (8,5)]  =-■ 
=  9  (explicit  forwards  branch  =  2, 

scope  covers  (6,7),  (6,8),  (7,9)  =  4  and 
KNOT  [(6,0)  ,  (7,9)]  3) 

=  1  (implicit  forwards  branch) 

--  1  (implicit  forwards  branch) 

=  2  (explicit  forwards  branch) 

=  8  (explicit  backwards  branch  =  6, 

scope  covers  (6,7)  and  (6,8)  --  2) 

=  2  (explicit  forwards  branch) 

=  1  (implicit  forwards  branch) 


12  and 
13) 


Knots : 


[(1,3)  , 

(2,6)] 

5 

[(2,6)  , 

(4,10)] 

= 

31 

[(2,6)  , 

(5,11)1 

= 

13 

[(2,6)  , 

(8,5)J 

= 

12 

[(3,5)  , 

(4,10)] 

= 

28 

[(4,10)  , 

(5,11)] 

= 

36 

[(4,10)  , 

'9,11)1 

= 

29 

[(5,11)  , 

(8,5)] 

= 

17 

[(6,8), 

(7,9)] 

3 

[(7,9)  , 

(8,5)] 

= 

10 

KEBOW  =  243 


This  example  shows  MEBOW  calculation,  but  because  the  program  has  no 
external  interconnections,  it  has  an  INFO  value  of  0.  An  example  for  both 
MEBOW  and  INFO  is  shown  in  Appendix  D,  Calculation  of  Metric  Value  for  Ada 
Procedure. 

Some  issues  that  have  not  yet  been  addressed  are  how  to  count  compound 
conditions  and  how  to  count  a  multiway  branch  or  CASE  statement,  McCabe 
suggests  that  each  condition  in  a  compound  condition  be  counted  separately 
(McCabe.  1983:10).  For  example,  the  statement  "IF  Cl  AND  C2  THEN"  would  add 
two  to  v(G)  because  It  is  equivalent  to  "IF'  Cl  THEN  IF  C2  THEN".  Critics  say 
that  this  is  not  realistic  because  no  matter  how  many  conditions  are  considered, 
only  one  of  tv/o  branches  will  be  followed.  Following  this  logic,  the  calculation 
of  MEBOW  disregards  the  number  of  conditions  in  a  compound  condition. 
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The  CASE  statement  was  developed  to  simplify  multiway  branch  statements 
so  that  a  series  of  nested  IF’. . .THEN. . .ELSE  structures  would  not  have  to  be 
created.  Therefore,  "It  is  natural  to  expect  that  a  good  complexity  metric 
should  assign  a  lower  complexity  value  to  a  t-way  CASE  structure  than  its 
equivalent  nested  IF. ..THEN. ..ELSE  structure"  (Jayaprakash  and  others,  1987:242). 
For  MEBOW,  an  N-way  CASE  statement's  complexity  is  calculated  as  2’'N,  instead 
of  the  N'“2  complexity  that  would  otherwise  be  calculated  by  following  the 
MEBOW  branch  counting  rules. 

The  definition  of  INFO  states  that  the  "fan-in  of  procedure  A  is  the 
number  of  local  flows  into  procedure  A  plus  the  number  of  data  structures  from 
which  procedure  A  retrieves  information"  and  the  "fan-out  of  procedure  A  is  the 
number  of  local  flows  from  procedure  A  plus  the  number  of  data  structures 
which  procedure  A  updates"  (Henry  and  Kafura,  1981:513).  An  exact  description 
of  what  constitutes  fan-in  or  fan-out  is  not  given.  For  example,  is  an  array 
passed  into  a  procedure  counted  as  1,  or  as  1  for  each  element  in  the  array? 
Is  a  record  counted  as  1,  or  as  i  for  each  field  in  the  record  that  is  modified 
in  the  procedure?  Following  the  counting  examples  given  by  Kitchenham,  the 
fan-in  is  considered  "the  number  of  data  structures  (not  individual  elements) 
the  program  reads  from"  (Kitchenham.  1988:370),  The  fan-out  is  calculated  the 
same  way.  Therefore,  a  pointer  variable  or  array  that  Is  input  or  output  adds 
just  one  to  fan-in  or  fan-out. 

In  languages  such  as  Ada,  it  is  easy  to  determine  which  procedure 
parameters  are  counted  as  input  data  flows,  which  are  counted  as  output  data 
flows,  and  which  are  counted  for  both.  These  parameters  are  designated  "in", 
"out",  and  "in  out"  in  the  procedure  declaration.  This  determination  is  more 
difficult  in  a  language  such  as  FU'  TRAN.  In  FORTRAN,  a  parameter  or  global 
variable  is  considered  an  output  data  flow  if  it  is  used  on  the  left  side  (left  of 


the  assignment  operator  "= ')  of  an  assignment  statement,  and  it  is  considered 
an  input  data  flow  if  it  is  used  anywhere  else.  If  it  is  used  for  both  purposes, 
it  is  counted  In  both  fan-in  and  fan-out. 

To  determine  if  a  variable  is  global,  the  procedure  must  be  parsed  and 
each  token  found  compared  to  a  list  of  the  tokens  for  the  language  being  used. 
If  the  token  is  not  a  reserved  word,  it  should  be  compared  to  the  local  symbol 
table,  which  is  a  list  of  those  variables  declared  within  the  procedure,  or 
compared  to  the  procedure's  parameter  list.  If  the  token  is  not  in  either  of 
these  lists,  then  it  should  be  assumed  that  the  token  Is  a  global  variable.  This 
definition  may  not  be  correct  when  using  a  language  such  as  FORTRAN,  which 
allows  the  programmer  to  declare  variables  at  any  point  in  the  procedure  by  the 
use  of  implicit  variable  type  declarations.  Using  this  counting  strategy,  these 
variables  would  be  counted  incorrectly  as  globals  which  will  increase  the  INFO 
value.  This  is  incorrect,  but  any  module  that  has  such  declarations  should  be 
monitored  closely  anyway  because  this  type  of  declaration  may  be  confusing  to  a 
maintainer.  In  Ada,  a  loop  parameter  within  a  FOR  loop  also  appears  to  have 
this  behavior,  but  this  loop  parameter  can  be  easily  found  and  added  to  the 
local  symbol  table  during  parsing  because  of  its  relation  within  the  FOR  loop 
parameter  specification. 

There  can  also  be  some  difficulty  in  determining  if  a  parsed  token  is  an 
array  or  a  function  while  counting  globals.  This  is  because  in  some  languages 
such  as  FORTRAN,  both  arrays  and  function  calls  delineate  their  Indexes  and 
parameters  with  parenthesis.  For  example,  the  statement  "RATE  = 
TAXRATEfEMPLOYEE) "  could  either  use  EMPLOYEE  as  an  Index  to  ttie  array 
TAXRATE,  or  EMPLOYEE  could  be  a  parameter  to  the  function  TAXRATE.  if 
TAXRATE  Is  not  declared  as  a  parameter  or  described  In  a  COMMON  block  as  an 


# 


array,  the  statement  is  ambiguous.  The  only  way  to  determine  the  semantics  of 
this  statement  is  to  parse  the  entire  program  and  determine  where  TAXRATE  is 
declared.  Fortunately,  this  problem  will  not  occur  in  most  modern  structured 
languages. 

Following  the  above  reasoning,  one  must  conclude  that  it  Is  not  possible 
to  make  a  detailed  counting  strategy  to  cover  all  cases.  Instead,  a  separate 
counting  strategy  is  needed  for  each  language.  This  is  the  only  way  to  account 
for  the  differences  inherent  in  each  language. 

Threshold  Value.  According  to  Kearney,  any  complexity  metric  should 
have  the  property  of  normativeness  (Kearney  and  others,  1986:1047).  This 
means  that  the  metric  should  provide  an  acceptable  norm,  or  standard  that 
specifies  an  allow'able  degree  of  complexity.  A  suitable  threshold  can  not  be 
determined  within  the  scope  of  this  research.  A  decision  was  made  not  to 
create  a  tool  to  generate  the  metrics,  instead  algorithms  to  implement  such  a 
tool  are  given  in  Appendix  B,  Algorithms  for  Metric  Value  Computation. 

Threshold  ranges  can  be  calculated  by  creating  test  cases  and  using  them 
with  a  program  that  calculates  the  metric  values  for  programs.  Taking  examples 
of  two  different  programs  that  perform  the  same  function  from  sources  such  as 
the  classic  book  The  Elements  of  Programming  Style  by  Kernigban  and  Plauger, 
researchers  can  compare  the  metric  values  to  see  how  well  they  relate  to  the 
subjective  opinions  of  structuredness  and  complexity  offered  about  the  programs. 
Another  source  of  equivalent  programs  is  software  maintained  for  the  Air  Force 
in  the  Air  Logistics  Centers  (ALC's).  Different  versions  of  programs  that  have 
been  Improved  by  the  maintainers  should  be  available,  and  the  metric  values 
can  be  determined  for  these  programs  and  compared  to  the  subjective  judgments 
of  the  maintainability  of  each  program.  This  method  for  obtaining  programs  to 
establish  a  useful  thresnold  is  preferred,  as  AFOTEC's  task  of  determining 
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pregram  maintainability  is  the  primary  purpose  for  this  thesis  effort.  Using 
these  programs  is  preferred  because  they  will  typically  be  larger  and  more 
complex  than  the  academic  examples,  and  a  judgment  has  been  made  to  the 
program  s  maintainability,  not  just  if  it  more  or  less  complex  than  another 
program. 

Comparing  the  metric  values  to  more  and  less  maintainable  programs  will 
show  more  than  a  practical  threshold  value.  This  will  also  show  if  the  two 
metrics  should  be  weighted  equally  in  the  consideration  of  program 
maintainability.  If  one  metric  consistently  agrees  with  the  decisions  which 
programs  are  more  maintainable,  then  it  can  be  weighed  more  than  the  other. 
Another  consideration  for  these;  weights  is  that  the  test  cases  might  suggest  a 
change  as  a  function  of  the  program's  characteristics.  For  example,  programs 
written  in  one  language  might  require  different  weights  to  better  reflect 
maintainability  than  those  programs  written  in  a  different  ianguage.  Evaluating 
different  classes  of  programs,  such  as  avionics  systems  and  database  systems, 
may  warrant  the  use  of  different  weights  for  the  metrics  Also,  extraneous  code 
can  be  added  to  programs,  and  the  differences  in  the  metric  vaiues  will  show 
the  sensitivity  of  the  indexes. 

Validation  of  Metrics 

Given  this  proposed  method  to  measure  maintainability,  a  procedure  to 
determine  how  well  the  metrics  actually  measure  maintainability  must  be 
developed.  This  is  a  very  important  consideration;  it  is  possible  to  create  an 
intuitively  appealing  metric  that  does  not  measure  what  it  was  intended  to 
measure.  AFOTEC  Is  aware  of  the  importance  of  validation,  as  is  evidenced  by 
their  efforts  to  validate  their  Voi.  3  process  (Lynn,  1985).  An  interesting 
discussion  of  validation  Is  given  in  Conte: 
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It  is  far  too  easy  to  create  an  attractive,  intuitive  model  without 
providing  data  that  shows  that  the  model  actually  does  explain  the 
software  phenomenon  of  interest.  Attempts  to  validate  programming 
models  have  involved  collecting  data  via  software  analyzers,  report 
forms,  and  interviews.  Statistics  have  been  employed  to  show 
relationships  among  metrics  and  to  try  to  produce  functions  of  those 
relationships  for  explanatory  and  predictive  purposes"  (Conte  and 
others,  1986:22-23]. 

The  validation  technique  suggested  is  for  the  maintaining  organization  to 
track  a  pilot  project's  history  using  a  survey  instrument.  This  survey 
instrument  will  record  various  system  parameters  that  influence  maintainability. 
After  a  certain  prescribed  period  of  maintenance  has  occurred,  the  results  of 
surveys  would  be  returned  to  AFOTEC  and  compared  to  the  predicted 
maintainability  of  each  program.  This  period  of  maintenance  should  include  the 
first  or  second  maintenance  block  changes,  because  by  then  the  maintainers  will 
have  a  good  understanding  of  the  system  and  which  portions  of  the  program  are 
the  most  difficult  to  understand  and  modify.  At  this  time,  they  will  have 
enough  knowledge  about  the  software  to  make  subjective  Judgments  about  the 
maintainability  of  each  module,  and  a  reasonable  database  of  changed  modules 
will  be  available. 

Within  this  survey  instrument  a  parameter  of  interest  is  the  occurrence  of 
errors.  "An  accepted  validation  technique  for  complexity  metrics  is  to  show  the 
correlation  of  the  metric  to  the  occurrence  of  errors"  (Henry  and  others, 
1981:130).  This  is  certainly  a  factor  that  drives  maintenance  and  many 
experiments  described  in  this  tiiesis  have  used  the  occurrence  of  errors  to 
reflect  a  program's  complexity. 

Another  factor  that  should  be  included  is  the  time  required  to  modify 
each  module,  however  a  module  may  be  defined.  Kafura  and  Canning  descrired 
the;r  use  of  ineirics  to  identify  outliers  with  respect  to  number  of  error-  and 
the  arnouril  of  coding  time  required.  In  addition  to  time,  other  factors  light  be 
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considered.  Otiier  information  that  AFOTEC  has  collected  in  the  past  while 
validating  the  Vol.  3  include:  program  size,  software  language,  frequency  of 

software  updates,  percentage  of  code  that  changes  with  each  update,  subjective 
ratings  of  maintainability  from  the  maintenance  teams,  and  training  time 
required  for  both  new'  programmers  and  more  experienced  programmers  (Lynn, 
1985).  As  the  following  passage  explains,  the  proposed  survey  tool  should 
include  all  of  this  information,  in  addition  to  the  error  occurrences  and  the 
modification  times  for  each  module. 

While  this  extra  information  might  not  be  used  in  a  correlation  with 
metric  values  for  the  system,  it  may  help  to  draw  conclusions  about  the  data 
received.  For  example,  if  the  metrics  identify  one  system  to  be  more  easily 
maintainable  than  a  second  system,  yet  the  number  of  errors  reported  and  the 
amount  of  time  required  to  make  cnanges  on  the  first  system  is  greater  than  the 
second,  several  factors  might  be  involved.  If  the  first  system  ic  m.aintained  by 
novice  programmers,  while  the  second  system  is  maintained  by  experienced 
programmers,  the  differences  from  the  predicted  maintainability  can  be  explained 
by  the  experience  level.  Conversely,  if  both  maintenance  teams  are  equivalent 
but  hate  differences  for  the  error  rates  and  time  required  to  implement  a 
change,  then  a  good  case  can  be  made  that  the  metrics  are  not  measuring 
maintainability  and  should  be  modified.  This  extra  Information  can  be  used  to 
determine  if  external  factors  have  unduly  influenced  the  ease  of  program 
maintenance  and  should  be  taken  into  account  during  the  validation  process. 

This  type  of  measurement  is  an  effort  to  enforce  seme  engineering 
discipline  on  the  collection  and  Interpretation  of  data  AFOTEC  s  effort  to 
validate  the  Vol.  3  re'led  upon  estimations  of  these  factors  bi'  the  project 
managers  and  senior  maintenance  persontu.u  because  either  no  detailed  project 
data  was  Kept  by  the  maintenance  staff,  or  this  data  was  not  released  to 
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AFOTEC.  A  decision  to  collect  this  data  must  be  made  by  both  AFOTEC  and  the 
maintaining  organization  if  a  pilot  study  is  to  be  made  to  validate  these 
metrics. 

Figure  13  shows  a  flow  diagram  of  what  should  happen  during  the 
validation  process.  The  left  side  of  the  diagram  defines  significant  development 
r.ilestones  from  the  earliest  point  that  AFOTEC  is  able  to  perform  software 
evaluations.  These  development  milestones  are  not  meant  lc  reflect  the 
software  development  phases  as  represented  in  M1L-STD'-21 67A,  but  to  show  the 
major  managerial  changes  as  the  program  goes  along. 

The  Operational  Test  and  Evaluation  phase  occurs  during  the  integration 
and  test  phases  of  software  development.  Th’s  phase  includes  two  different 
blocks  in  the  flow  diagram.  The  first  block  depicts  a  baseline  of  the  software 
as  it  is  first  delivered  to  AFOTEC  for  evaluation.  The  second  block  represents 
future  evaluations  of  the  software  dii-ing  the  development  process,  which  might 
take  several  years  on  a  large  system.  These  evaluations  tan  be  compared  w'i.th 
the  baseline  to  determine  if  the  software  is  being  made  more  or  less 
maintainable  because  of  the  many  changes  made  during  the  Integration  and  test. 
Currently,  multiple  maintainability  evaluations  are  performed  on  software  in 
order  to  detect  any  problems  so  the  developer  can  be  directed  to  fix  them. 

The  Program  Management  Kesponsibility  Turnover  (FMRT)  represents  the 
final  version  of  software  delivered  by  the  developer.  After  PMRT  the 
maintaining  organization  has  maintenance  control  of  the  software.  This  block 
suggests  a  final  post- development  baseline  to  determine  how  maintainable  the 
system  that  was  delivered  is. 

The  System  Mai nt enance/Cperal ional  Support  phase  encompasses  two  blocks 


within  the  flow  diagram 


'fhe  first  is  data  collection. 


These  data  reflect  those 
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factors  expressed  above,  error  count  and  time  to  modify,  with  other  factors  as 
extra  information.  Once  an  acceptable  amount  of  data  is  collected,  a  comparison 
is  made  with  the  metric  values  associated  with  the  modules,  and  their  actual 
error  counts  and  modification  times. 

Oummary 

This  chapter  is  the  culmination  of  the  work  described  in  the  previous  two 
chapters.  The  use  of  hybrid  metrics  was  introduced  in  Chapter  Two,  but  little 
documentation  was  given  to  support  their  use.  MEBOW  and  INFO  were  explained 
in  some  detail  in  Chapter  Two,  but  the  experiments  that  showed  their 
effectiveness  were  not  described.  Chapter  Three  demonstrated  how  to  determine 
if  a  metric  measures  maintainability  and  two  metrics  were  exhibited  as  being 
able  to  measure  maintainability  effectively  if  paired  together. 

This  chapter  provided  a  lengthy  description  of  the  usefulness  of  hybrid 
metiics,  along  with  descriptions  of  experiments  Indicating  the  use  of  hybrid 
metrics  as  a  valuable  measurement  technique.  Then  supporting  evidence  was 
given  that  MEBOW  and  INFO  are  both  useful  metrics  and  have  been  used  to 
measure  complexity  successfully.  Metric  calculation  issues  were  d.lscu8sed  and 
examples  were  given  showing  the  application  of  the  described  counting 
strategie.s. 

While  the  use  of  hybrid  metrics  and  the  two  metrics  described  have  been 
documented,  it  is  understood  that  they  have  not  been  used  together.  Therefore, 
Important  information  about  the  metrics  such  as  their  sensitivity,  useful 
threshold  values,  and  if  one  metric  better  reflects  maintainability  for  the  types 
of  programs  these  will  be  used  to  measure,  have  not  been  determined.  This  Is  a 
reason  to  emphash'e  the  Importance  of  metric  validation. 


V.  Conclusions  and  Recommendations 


Introduction 

This  research  has  involved  a  survey  of  metrics,  a  definition  of  criteria  to 
determine  which  metrics  measure  maintainability,  and  an  in-depth  look  at  tw'o 
metrics  which  meet  the  most  of  these  criteria.  A  determination  will  be  made 
now'  whether  using  these  metrics  will  actually  solve  he  problems  that  were 
given  in  Chapter  One  to  be  resolved.  This  chapter  explains  what  was 
accomplished  and  how  the  problem  was  met.  Then  the  limitations  of  the 
solution  will  be  elaborated.  Finally,  some  recommendations  for  further  research 
are  examined. 

Conclusions 

A  framework  for  the  automated  evaluation  of  software  maintainability  was 
developed.  A  set  of  criteria  to  determine  which  automatable  complexity  metrics 
better  reflect  maintainability  than  others  was  defined.  The  metrics  discussed 
were  compared  with  each  criteria  and  the  metrics  that  had  the  most  complete 
coverage  in  each  of  the  three  criteria  groups  was  determined.  A  combination  of 
the  two  metrics  MEBOW  and  INFO  was  determined  to  have  the  best  coverage 
overall  of  the  criteria. 

After  these  two  metrics  were  selected  for  use,  their  implementation  was 
studied.  Algorithms  to  calculate  the  two  metrics  were  developed,  although  a 
tool  to  generate  the  metrics  was  not  created.  A  method  to  determine  useful 
threshold  values  for  these  metrics  was  explained.  Sources  for  test  cases  to 
determine  these  thresholds  were  given.  A  procedure  to  validate  the  use  of 
these  metrics  to  measure  maintainability  was  developed.  This  procedure 
specified  what  data  must  be  collected,  and  when  it  should  be  collected. 
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The  following  section  examines  two  issues:  how  the  given  problem  was 
solved,  and  the  limitations  and  benefits  of  automated  metrics  are  presented  (or 
how  metrics  can  be  helpful  once  the  limited  information  they  yield  is 
understood). 

How  the  Problem  was  Solved.  The  problem  was  to  develop  software 
maintainability  metrics  to  be  incorporated  into  the  AFOTEC  Vol.  3.  Constraints 
on  the  metrics  researched  were  that  they  had  to  fit  into  the  scope  of  the  seven 
characteristics  of  maintainability  explained  in  Chapter  One.  These  metrics  were 
to  measure  aspects  of  maintainability  that  the  Vol.  3  does  not,  and  they  must 
be  automatable.  The  following  discussion  presents  each  of  these  issues. 

These  metrics  were  to  be  incorporated  into  the  Vol.  3,  and  applied  within 
the  seven  maintainability  characteristics.  This  was  only  partially  fulfilled. 
Witen  this  research  began,  the  metrics  were  to  be  incorporated  into  the  Vol.  3  as 
questions,  with  equal  weight  as  the  other  questions.  As  more  information  about 
metrics  was  acquired,  though,  a  decision  was  made  not  to  include  these  metrics 
within  the  overview  of  the  Vol.  3,  but  to  use  them  in  a  separate  maintainability 
evaluation  w'hose  results  could  support  the  Vol.  3.  Also,  separate  metrics  to 
reflect  each  of  the  seven  maintainability  characteristics  were  not  discovered. 
Simplicity  is  the  only  characteristic  that  is  measured  by  the  proposed  metrics. 

These  metrics  were  to  measure  aspects  of  maintainability  that  the  Vol.  3 
does  not.  MEBOW  reflects  complexity  issues  such  as  the  levels  of  nesting  in 
control  structures,  and  the  scope  of  branches.  These  are  addressed  within  the 
Vol.  3,  but  are  not  measured  in  the  same  way  that  MEBOW  does.  The  Vol.  3 
asks  the  evaluator  for  a  subjective  estimate  of  how  complex  is  a  module's 
nesting,  while  MEBOW  objectively  calculates  the  complexity  caused  by  nesting. 
INFO  reflects  the  complexity  of  the  data  connections  between  modules,  while  the 
Vol.  3  subjectively  measures  the  number  of  global  variables  used,  and  hew  well 
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the  input  and  output  parameters  to  a  module  are  described.  Therefore,  MEBOW 
and  INFO  measure  some  different  aspects  of  maintainability  than  the  Vol.  3 
does.  These  metrics  can  be  automated  and  used  to  evaluate  all  of  a  program's 
source  code,  instead  of  just  a  fraction  of  it. 

Each  aspect  of  the  problem  has  been  considered,  and  the  constraints  have 
been  met.  This  use  of  an  automated  tool  that  calculates  MEBOW  and  INFO  to 
measure  the  maintainability  of  software  is  a  solution  for  the  problem  given. 
The  aspect  of  this  problem  that  was  not  sufficiently  answered  is  the  automated 
use  of  metrics  that  measure  qualities  supporting  the  six  characteristics  other 
than  simplicity. 

The  Limitations  and  Benefits  ^  Metrics.  The  metrics  presented  can  be 
used  to  reflect  maintainability,  if  they  are  used  correctly  and  their  limitations 
are  understood.  These  metrics  can  be  used  to  gather  data,  but  the 
interpretation  of  this  data  must  be  made  with  a  clear  understanding  of  what  the 
data  mean,  Rodriguez  and  Tsai  state  in  their  conclusion,  "The  final  conjecture 
states  that  the  metrics  should  not  be  accepted  as  axioms.  They  give 
information,  but  that  information  has  to  be  interpreted  in  the  context  of  the 
particular  system  being  measured"  (Rodriguez  and  Tsai,  1986:368). 

Metric  analyses  are  useful  only  to  compare  "apples  with  apples",  which  is 
a  reason  that  a  well-defined  counting  strategy  is  needed  (Conte  and  others, 
1986:27).  Someone  comparing  two  different  sets  of  data  should  have  confidence 
that  they  were  both  counted  in  a  consistent  manner.  These  software  metrics 
require  calibration  from  historical  data  gathered  in  a  specific  environment  to 
establish  appropriate  weights  and  threshold  values  (ibid).  The  suggested  metrics 
have  been  shown  to  reflect  the  complexity  of  modules  written  in  procedural 
languages,  but  no  evidence  supports  their  use  with  different  paradigms.  An 


application  of  MEBOW  to  modules  written  in  languages  such  as  LISP,  PROLOG,  or 
Smalltalk  might  not  be  practical.  Any  comparison  of  measured  values  from 
modules  written  in  one  of  these  languages  and  other  modules  written  in  a 
procedural  language  such  as  Ada  might  not  reflect  their  relative  complexity. 

While  softw'are  metric  results  can  assist  the  decision-making  process  of 
software  development  and  testing  personnel,  they  cannot  replace  this  process 
(ibid).  module  that  rates  below  the  threshold  value  may  still  be  more 
difficult  to  maintain  than  one  that  scores  above  the  threshold,  depending  on 
maintainability  factors  that  are  not  measured  by  the  proposed  metrics.  P'or 
example,  a  well-commented  module  that  is  complex  may  be  more  easily 
maintained  than  a  less  complex  module  that  has  no  comments.  This  factor  is 
not  reflected  by  the  automated  metrics.  That  is  why  the  metrics  should  be  used 
in  an  advisory  capacity,  as  Harrison  and  Cook  suggest,  "On  the  practical  side, 
our  study  suggests  that  software  project  managers  can  use  software  complexity 
measures  as  a  tool  in  identifying  the  few  subprograms  most  likely  to  contain  the 
majority  of  errors,  and  hence  can  allocate  their  testing  resources  more 
efficiently"  (Harrison  and  Cook,  1987:214). 

Along  these  same  lines,  how  the  software  development  managers  use  the 
metrics  should  be  limited.  According  to  Conte,  "software  metrics  and  models  are 
intended  to  be  used  to  manage  products,  not  for  evaluating  the  performance  of 
technical  staff"  (Conte  and  others,  1986:27-28).  If  the  programmers  understand 
that  their  performance  is  being  measured,  they  will  quickly  find  ways  to  realize 
improved  metric  results,  even  if  this  does  not  improve  the  maintainability  of  the 
program. 

Another  limitation  is  that  the  difficulty  and  cost  of  computing  metrics 
may  be  high  (ibid).  This  is  a  problem  with  the  use  of  the  Vol.  3  evaluation 
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technique.  The  automation  of  MEBOW  and  INFO  should  lessen  the  cost  of 
measuring  these  metrics. 

Several  benefits  of  using  these  automated  metrics  have  been  briefly 
described.  One  benefit  is  that  modules  that  are  most  likely  to  contain  errors 
will  be  identified,  and  greater  testing  resources  can  be  allocated  to  those 
modules.  .Mso,  those  modules  that  are  too  complicated  are  recognized,  and  they 
can  be  further  decomposed  to  less  complex,  more  maintainable  modules.  Cote 
expresses  an  interesting  analogy  for  the  use  of  metrics  by  asserting,  "metrics 
can  greatly  help  in  depicting  the  features  and  layouts  embedded  in  thousands  of 
lines  of  code,  in  much  the  same  way  that  gauges  and  dials  give  a  nuclear  plant 
operator  an  idea  of  what  is  going  on  inside  a  reactor"  (Cote  and  others, 
19a8:121), 

These  sections  have  shown  that  the  automated  calculation  of  MEBOW  and 
INFO  will  resolve  the  problems  this  research  has  attempted  to  solve.  The 
limitations  inherent  with  the  use  of  automated  metrics  have  been  described, 
along  with  the  benefits  from  their  use.  Metrics  have  an  important  application, 
but  should  not  be  used  out  of  their  limited  context. 

Recommendations 

The  use  of  a  hybrid  control  and  data  structure  metric  appears  to  answer 
AFOTEC's  needs.  Before  these  metrics  should  be  used,  though,  certain  issues 
must  be  considered.  Recommendations  to  resolve  these  issues  are  explained. 

The  first  recommendation  Is  that  a  tool  that  measures  both  MEBOW  and 
INFO  must  be  built.  Although  either  metric  value  can  be  computed  manually, 
this  process  is  difficult  and  time-consuming.  This  manual  computation  would 
also  violate  one  of  the  constraints  given,  that  no  extra  work  be  given  to  the 
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evaluators.  An  implementation  of  these  metrics  in  an  automated  tool  is  a 
necessary  first  step  for  the  use  of  these  metrics. 

Until  some  type  of  study  has  shown  that  this  hybrid  metric  reliably 
reflects  maintainability,  its  use  should  be  considered  advisory.  A  pilot  study 
following  the  validation  method  presented  in  Chapter  Four  will  relate  the  metric 
values  to  maintainers'  subjective  ideas  of  which  modules  were  more  maintainable. 
Once  this  study  has  been  accomplished,  a  determination  can  be  made  if  the 
metrics  actually  reflect  maintainability.  If  the  study  suggests  they  do  reflect 
maintainability,  the  metrics'  results  can  be  used  in  the  same  manner  as  the  Vol. 
3  evaluation  results.  If  the  study  suggests  the  metrics  do  not  measure 
maintainability,  perhaps  some  weighting  of  the  metrics  can  be  used  that  will 
better  reflect  maintainability,  or  a  different  class  of  metric  can  be  included. 
While  data  for  this  study  is  being  accumulated,  these  metrics  can  be  compared 
to  the  Vol.  3  results  and  other  subjective  measurements  of  maintainability. 

The  amount  of  data  that  will  have  to  be  collected  by  maintainers  for  this 
pilot  study  may  cause  objections  from  the  maintainers.  They  may  iesent  the 
amount  of  time  and  effort  required  for  a  study  that  will  not  immediately  support 
their  office.  The  importance  of  collecting  this  data  must  be  emphasized  to  the 
maintainers,  as  well  as  the  positive  impact  to  the  evaluation  of  maintainability 
of  future  software  systems. 

A  recommendation  for  further  study  is  for  someone  to  complete  the 
validation  pilot  study.  This  is  most  likely  an  effort  that  AFOTEC  will  have  to 
provide  for  itself,  as  the  data  collection  may  take  some  time.  Once  this  data 
collection  has  been  accomplished,  following  the  guidelines  presented  in  the 
previous  chapter,  a  comparison  should  be  made  with  the  metrics'  maintainability 
predictions  and  the  collected  maintainability  data. 


Another 


important  area 


is  to  discover  software  metrics  that  reflect 


characteristics  other  than  just  complexity.  For  example,  Harrison  and  Cook 
presented  a  measure  of  "documentation"  that  they  use  to  reflect  how  self- 
descriptive  a  module  is  (Harrison  and  Cock,  1987:215).  This  measure  is  Just  a 
ratio  of  the  number  of  comments  to  the  number  of  total  lines  of  the  module. 
But  perhaps  other  descriptiveness  metrics  are  available  which  cannot  be  so 
easily  thwarted.  Possibly  some  metric  that  reflects  how  modular  the  software  is 
can  be  developed.  These  types  of  metrics  could  be  used  along  with  the 
complexity  metrics  already  suggested  and  would  broaden  the  metric  coverage  to 
determine  software  maintainability. 


Summary 

The  use  of  metrics  to  measure  software's  complexity  and  maintainability 

shows  much  promise,  even  if  the  initial  fascination  with  some  metrics  such  as 

Halstead's  Software  Science  and  McCabe's  cycTomatic  complexity  has  worn  off. 

This  is  well  described  by  the  conclusions  of  Kafura  and  Canning: 

Even  if  software  metrics  had  no  other  use  their  proven  ability  to 
identify  the  most  error-prone  components  would  be  of  tangible 
value  to  software  developers.  This  tangible  value  is  particularly 
evident  if  the  structure  metrics  can  be  used  to  Identify  the  most 
error  prone  components  since  this  would  permit  the  system  to  be 
redesigned  so  as  to  avoid  components  of  ttiis  type  altogether. 
Furthermore,  information  on  error-prone  components  wou'd  allow  the 
testing  or  code  review  processes  to  be  conceritra!  d  on  these 
components  [Kafura  and  Canning,  1985:381). 


Appendix  A:  Justification  for  Metric  Complexity  Criteria  Ratings 
In  Chapter  Three,  Figure  9  shows  a  matrix  of  metrics  vs.  the  metric 
selection  criteria  developed  in  that  chapter.  Within  the  matrix,  marks  are 
shown  that  indicate  if  the  metric  meets  the  metric  selection  criteria,  and  the 
level  of  agreement  if  it  does  so.  This  appendix  explains  the  reasoning  behind 
the  agreement  indications.  The  metrics  are  listed  in  the  same  order  as  they 
are  shown  in  Figure  9,  and  remarks  are  given  for  each  criteria. 

LOG 

Clear  and  Unambiguous:  As  the  example  in  Figure  1  shows,  LOC  calculation  is 
ambiguous  without  a  definite  counting  strategy. 

Intuitive:  A  longer  program  is  likely  to  be  more  difficult  to  maintain  than  a 
shorter  one. 

Language  Independent:  Each  language  needs  a  different  counting  strategy. 

Prescriptive:  If  a  module  is  significantly  longer  than  others  in  the  same 
program,  it  is  a  candidate  for  further  decomposition.  This  does  not  give  any 
indication  how  the  module  should  be  broken  up,  though. 

Robustness;  Making  a  program  shorter  by  breaking  it  up  into  modules  should 
lessen  its  complexity.  Thi.s  will  be  reflected  by  the  metric,,  But  as  a 
counter-example,  if  all  the  comments  are  taken  out  of  the  module,  it  will  be 
shorter,  but  more  complex. 

Accurately  Reflect  Control  Flow:  NA 

Ranking  Basic  Control  Structures:  NA 

Nesting  and  Compound  Conditions:  NA 

Accurately  Reflect  Data  Flow:  NA 

Indicates  Data  Amount:  NA 

Shows  Data  ’’  e :  NA 

Reflects  Inter-Mod.ule  Data  Links:  NA 


N 

Clear  and  Unambiguous:  Chapter  Two  explains  that  some  tokens  can  be  both 

operators  and  oyerands,  which  complicates  the  metric  calculation. 
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Intuitive:  As  programs  are  composed  of  operands  and  operators,  a  program  with 
more  operators  and  operands  is  likely  to  be  more  difficult  to  understand  than 
one  with  fewer  operators  and  operands. 

Language  Independent:  In  languages  such  as  LISP,  the  difference  between 
operators  and.  operands  is  not  clear. 

Prescriptive:  A  module  that  has  a  larger  number  of  operators  and  operands 

should  be  decomposed  into  shorter  modules,  but  this  gives  no  suggestion  how  to 
accomp.Lish  the  decomposition. 

Robustness:  If  a  module  is  changed,  operators  will  either  be  added  or 

deleted.  This  will  reflect  that  a  change  occurred. 

Accurately  Reflect  Control  Flow:  NA 

Ranking  Basic  Control  Structures:  MA 

Nesting  and  Compound  Conditions:  NA 

Accurately  Reflect  Data  Flow:  NA 

Indicates  Data  Amout.t :  While  the  total  number  of  operands  might  be  considered 
a  reflection  of  the  amount  of  data  used  in  a  program,  N  includes  both 
operators  and  operands  and  does  not  show  just  the  data. 

Shows  Data  Use:  NA 

Reflects  Inter-Nodule  Data  Links:  NA 


Span 

Clear  and  Unambiguous:  The  number  of  lines  between  two  variable  references  is 
not  difficult  to  count. 

Intuitive:  The  fewer  the  number  of  lines  between  variable  references,  the 

more  likely  a  maintainer  will  be  able  to  understand  a  variable's  usage. 

Language  Independent:  This  can  be  used  with  any  language  that  has  variables 
and  lines  between  t!iem. 

Prescriptive:  If  a  variable  has  a  large  span,  it  is  possible  that  the  module 

is  too  large  and  should  be  decomposed. 

Robustness:  A  change  in  the  number  of  lines  between  two  references  will  be 

reflected,  as  well  as  any  added  variable  reference*;  will  change  the  span  for 
that  variable. 

Accurfitely  Reflect  Control  Flow:  NA 


Ranking  Basic  Control  Structures:  NA 


Nesting  and  Compound  Conditions:  NA 
Accurately  Reflect  Data  Flow:  NA 
Indicates  Data  Amount:  NA 

Shows  Data  Use:  This  reflects  how  data  is  used  within  a  module  to  the  extent 
that  it  shows  locality  of  variable  references. 

Reflects  Inter-Module  Datd  Links:  NA 


INFO 


Clear  and  Unambiguous:  The  definition  presented  by  Henry  (Henry  and  Kafura, 
1981)  is  straightforward.  Others  have  not  been  able  to  calculate  these 
values.  For  example,  Harrison  and  Cook  did  not  use  (fan-in  *  fan-out)  *2  in 
their  calculations  (Harrison  and  Cook,  1987) .  This  does  not  suggest  that  the 
INFO  calculation  is  ambiguous,  instead,  it  reflects  their  inability  to 
separate  their  data  into  fan-in  and  fan-out.  This  is  not  a  problem  in  the 
interpretation  of  a  counting  strategy,  such  as  LOC. 

Intuitive:  The  greater  the  number  of  connections  to  other  modules,  the 

greater  the  possible  impact  of  any  change  and  the  higher  the  complexity. 

Language  Independent:  Any  language  that  can  be  broken  into  modules  can  have 
the  data  links  measured. 

Prescriptive:  Henry  and  Kafura  used  INFO  to  show  which  modules  in  the  Unix 

kernel  were  data  "choke-points”  (henry  and  Kafura,  1981:517).  This  gives  an 
indication  of  the  effect  that  modifying  a  module  will  have  on  the  other 
modules  in  a  system. 

Robustness;  If  a  change  to  the  number  of  data  items  referenced  or  modified  is 
made,  this  will  reflect  the  change.  This  will  not  reflect  a  change  to  ow  the 
data  is  used,  or  any  change  in  the  module  control  flow. 

Accurately  Reflect  Control  Flow:  NA 

Ranking  Dasic  Control  Structures:  NA 

Nesting  and  Compound  Conditions:  NA 

Accurately  Reflect  Data  riow:  This  will  reflect  the  inter-moduIe  data  flow, 
but  not  the  inlra-module  data  flow. 

Indicates  Data  Amount:  This  reflects  the  parameter  and  global  data  flows  into 
and  out  of  a  module. 

Shows  Data  Use:  This  shows  the  amount  of  data,  but  net  its  u,se,  in  a  module. 


83 


Reflects  Inter-Module  Dv^ta  Links:  This  reflects  the  parameter  and  global  data 
flows  into  and  out  of  a  module. 


v(G) 

Clear  and  Unambiguous:  Different  counting  strategies  have  been  introduced  for 
control  structures  such  as  case  statements. 

Intuitive;  The  greater  the  number  of  branches  in  a  module,  the  more  difficult 
it  will  be  to  understand. 

Language  Independent:  This  operates  on  a  directed  graph  representation  of  a 
program,  so  it  is  independent. 

Prescriptive:  This  tells  when  a  program  has  become  too  complex.  By  viewing 
the  control  flow  graph,  a  determination  can  be  made  how  sections  of  code  can 
be  separated  into  a  different  nodule  without  adversely  affecting  the  structure 
of  the  original  module. 

Robustness:  A  change  in  a  control  structure  may  cr  may  not  be  reflected.  A 
rearrangement  of  a  module  that  contains  the  number  of  branches  will  not 

have  a  different  v(G)  value.  Any  change  to  sequential  statements  will  not  be 
reflected,  nor  will  any  change  to  the  data  flows. 

Accurately  Reflect  Control  Flow:  This  does  represent  the  control  flow  of  a 
module. 

Ranking  Basic  Control  Structures:  A  series  of  sequential  statements  is 
pre.sented  as  less  complex  than  a  branch.  An  IF. .  .THEN. .  .ELSE  branch  is  shown 
as  more  complex  than  an  IF... THEN  branch.  An  iteration  construct  is  more 
complex  than  a  sequential  statement. 

Nesting  eind  Compound  Conditions.  Arqu  ,nts  were  given  in  Chapter  Two  saying 
that  this  does  not  reflect  nesting  or  compound  conditions. 

Accurately  Reflect  Data  Flow:  NA 

Indicates  Data  Amount:  NA 

Shows  Data  Use:  NA 

Reflects  Inter-Module  Data  Links:  NA 


I^ot 

Clear  and  Unambiguous:  A  crossing  of  control  paths  is  easily  understandable. 
If  the  language  in  use  allows  multiple  statements  on  one  line,  though,  some 
difficulty  in  determining  if  a  knot  occurs  may  arise. 

Intuitive:  If  structured  programming  is  considered  to  be  a  useful  paradigm, 

then  any  reflection  of  unstructuredness  will  show  added  complexity. 


Language  Independent:  The  problem  of  calculating  the  knot  count  with  a 
language  that  allows  multiple  statements  on  a  line  arises. 

Prescriptive:  According  to  Woodward,  (Woodward  and  others,  1983)  if  a  module 
with  knots  is  rewritten  to  have  fewer  knots,  it  will  be  less  complex.  The 
problem  is  how  to  rearrange  the  code  to  have  fewer  knots. 

Robustness:  A  change  in  a  module’s  control  structure  will  be  reflected.  But 
any  change  to  sequential  code  or  data  flow  will  not  be  reflected. 

Accurately  Reflect  Control  Flow:  This  does  not  represent  the  program's 
underlying  control  flow,  instead,  it  reflects  the  unstructuredness  of  the 
source  code  text. 

Ranking  Basic  Control  Structures:  No  ranking  is  given. 

Nesting  and  Compound  Conditions:  This  does  reflect  nesting  and  any  branches 
out  of  nested  control  structures. 

Accurately  Reflect  Data  Flow:  NA 

Indicates  Lata  Amount:  NA 

Shows  Data  Use:  NA 

Reflects  Inter-Module  Data  Links:  NA 


MEBOW, 

Clear  and  Unambiguous:  This  is  slightly  more  complex  to  understand  than 
either  v(G)  or  Knot,  but  is  precisely  defined. 

Intuitive:  The  more  complex  the  control  structures  are  in  a  module,  the  more 

complex  the  module  is. 

Language  Independent:  This  operates  on  a  directed  graph  representation  of  a 
program. 

Prescriptive:  This  tells  when  a  program  has  become  too  complex.  By  viewing 
the  control  flow  graph,  a  determination  can  be  made  how  sections  of  code  can 
be  separated  into  a  different  module  without  adversely  affecting  the  structure 
of  the  original  module.. 

Robustness:  To  a  greater  extent  than  either  v(G)  or  Knot,  this  will  reflect 

any  changes  in  a  module's  control  flow.  But  this  has  their  same  limitation 
that  It  dees  net  reflect  data  flow  or  the  amount  of  sequential  code. 

Accurately  Reflect  Control  Flow:  This  reflects  the  control  flow  as  well  as 
v(G),  and  shows  unstructuredness  ar  well  as  Knot, 


Ranking  Basic  Control  Structures:  This  reflects  the  ordering:  sequential 

statements  <  condition  statements  <  iteration  statements. 

Nesting  and  Compound  Conditions:  This  reflects  nesting  because  of  the  scope 
component  in  each  branch's  value. 

Accurately  Reflect  Data  Flow:  NA 

Indicates  Data  Amount:  NA 

Shows  Data  Use:  NA 

Reflects  Inter-Module  Data  Links:  NA 

E 

Clear  and  Unambiguous:  This  has  the  same  counting  ambiguities  as  N. 

Intuitive:  Studying  the  total  and  unique  operators  and  operands  is 

understandable.  The  weighting  factors  in  the  E  value  lessen  its  easy 
understanding . 

Language  Independent:  This  has  the  same  problem  as  N. 

Prescriptive:  A  higher  value  will  suggest  that  the  program  needs  to  be 

decomposed,  but  does  not  give  any  guidelines  how  to  implement  this 

decomposition. 

Robustness:  Changing  the  number  of  operators  or  operands  will  make  a 

difference  in  the  metric  value,  especially  if  the  operator  or  operand  has  not 
been  used  yet  in  the  module. 

Accurately  Reflect  Control  Flow:  NA 

Ranking  Basic  Control  Structures:  NA 

Nesting  and  Compound  Conditions:  NA 

Accurately  Reflect  Data  Flow:  NA 

Indicates  Data  Amount;  NA 

Shows  Data  Use;  NA 

Reflects  Inter-Module  Data  Links;  NA 

v(G),  Ill 

Clear  and  Unambiguous;  The  calculation  of  v(G)  was  explained  above.  Counting 
the  number  of  unique  operators  may  be  difficult,  as  the  N  section  reflectu. 


Intuitive:  Adding  operators  to  the  branches  should  show  *ore  about  the 

modules 's  complexity  than  either  will  separately. 

Language  Independent:  While  v(G)  is  language  independent,  the  operator  count 
is  not. 

Prescriptive:  This  comes  from  the  description  of  v(G)  prescriptiveness. 

Robustness:  A  change  in  either  the  number  of  branches  or  the  number  of  unique 
operators  will  be  reflected  by  the  metric.  Any  change  in  data  will  not  be 
reflected . 

Accurately  Reflect  Control  Flow:  This  will  reflect  control  flow  as  well  as 
v{G)  . 

Ranking  Basic  Control  Structures:  This  ranks  structures  as  well  as  v(G). 

Nesting  and  Compound  Conditions:  This  has  v(G)'s  limitations  in  reflecting 
nesting . 

Accurately  Reflect  Data  Flow:  KA 
Indicates  Data  Amount:  NA 
Shows  Data  Use:  NA 

Reflects  Inter-Module  Data  Links:  NA 


C 

Clear  and  Unambiguous:  This  ■'i,'  a  very  complex  metric  to  calculate. 

Intuitive:  Using  both  a  data  structure  metric  with  a  control  flow  metric 

gains  the  benefits  explained  for  a  hybrid  cetric. 

Language  Independent:  This  Joes  n  t  reflect  any  particular-  language,  as  long 
as  a  procedural  language  is  used. 

Prescriptive;  A  high  data  flo>»  or  control  flow  component  will  suggest  that 
the  module  needs  to  be  decomposed,  but  doesn't  explain  now  to  best  decompose 
the  module. 

Robustness:  Any  change  of  "locally  exposed”  data  references  will  be 

reflected,  and  changes  .'.n  the  control  flow  will  be  reflected. 

Accurately  Reflect  Control  Flow:  This  reflects  the  numbei  of  branches  in  a 

module . 

Ranking  Basic  Control  Structures:  This  ranks  sequential  statements  as  less 

complex  than  conditions. 


Nesting  and  Compound  Conditions:  This  reflects  nesting,  and  reflects  the  use 
of  data  vithin  nested  statements. 


Accurately  Reflect  Data  Flow:  This  shows  the  use  of  data  and  the  amount  of 
data  that  are  "locally  exposed". 

Indicates  Data  Amount:  The  data  flow  factor  reflects  the  number  of  variables 
used . 

Shows  Data  Use:  The  use  of  data  within  a  module  is  reflected  within  each  node 
of  the  module. 


Reflects  Inter-M  fule  Data  Links:  This  is  an  intra-module  metric. 


Appendix  B:  Algor tor  Metric  Value  Computation 
This  sectiou  explains  algorithms  for  calculating  the  metric  values  for. 
both  MEBOW  and  INFO.  The  INFO  calculations  are  given  first,  then  the  MEBOW 
algorithms  are  explained.  Each  section  will  have  at.  explanation,  followed  by 
a  pseudocode  representation  of  the  algorithm. 

Information  Flow  Calculation .  Thi.s  calculation  requires  a  list  of  the 
reserved  words  for  the  language  the  module  under  evaluation  is  written  in. 

This  algorithm  assumes  that  a  parser  is  available  that  can  return  tokens  for 

the  language  being  used,  along  with  a  li.st  of  parameters  from  the  module 
calling  statement.  The  algorithm  goes  through  the  module,  looking  for  valid 
identifiers,  and  compares  each  identifier  to  this  list  of  reserved  words.  If 
the  identifier  is  in  the  list,  it  is  discarded  and  the  next  identifier  is 
evaluated.  When  the  module's  parameter  declarations  are  evaluated,  any 
identifier  used  as  an  input  variable  is  added  to  the  input  list  and  any 
identifier  used  as  an  output  variable  is  added  to  the  output  list.  Any 
variables  declared  locally  are  placed  in  the  reserved  word  list  so  they  will 
not  be  counted  as  input  or  output  data  flows. 

If  the  identifier  is  not  in  the  list,  depending  on  how  the  identifier  is 

used,  it  is  compared  to  either  a  list  of  input  identitiers  or  a  list  of  output 

identifiers.  If  the  identifier  is  on  the  left  side  of  an  assignment,  it  is 
compared  to  the  output  identifiers.  If  it  is  used  in  any  other  expression,  it 
1.S  compared  to  the  list  of  input  identifiers.  If  the  identifier  is  not  in  the 
appropriate  list,  it  is  .idded  to  t.iat  list.  This  operation  continues  until 
the  end  of  the  inodu.ve  is  reached. 


# 


-  This  section  adds  the  module’s  parameters  to  the  input  and  output 
identifier  lists. 

REPEAT 


IF  the  identifier  is  an  input  parameter 
THEM 


ADD  the  ideiititier  to  the  list  of  input  identifiers 


ELSE 


ADD  the  identifier  to  the  list  of  output  identifiers 


®  ENfDIF 

UNTIL  no  more  parameters  are  found 


--  This  section  adds  the  module’s  locally  declared  variables  to  the  list 
%  -  -  of  reserved  words  so  they  will  not  be  counted  for  the  INFO  value 

REPEAT 


IF  a  valid  variable  declaration  is  found 
THEN 


ADD  the  identifier  to  the  list  of  reserved  words 


ENDIF 

UNTIL  the  beginning  of  the  program  body  is  found 


# 


# 


--  This  section  looks  for  identifiers  within  the  body  of  the  module  and 
--  adds  these  identifiers  to  the  input  and  output  identifier  lists. 

REPEAT 


INPUT  an  identifier 

COMPARE  the  identifier  to  a  list  of  reserved  words 
IF  the  identifier  is  not  in  thi.s  Hst 
THEN 


IF  the  identifier  is  on  the  left  side  of  an  assignment 
THEN 

COMfARF,  the  identifier  to  a  list  of  ovilput  identifiers 
IF  the  identifier  \.s  not  in  this  list 
THEN 

ADD  the  identifier  to  the  list  of  output  identifiers 
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ENDIF 


KIjSE  (the  identifier  is  not,  being  assigned  a  value) 

COMPARE  the  identifier  to  a  li^-t  of  input  identifiers 

IF  the  identifier  is  not  in  this  list. 

THEN 

ADD  the  identifier  to  the  list  of  input  identifiers 

ENDIF 

ENDIF 

ENDIF 

UNTIL  the  eiid  of  module  is  reached 

CALCULATE  INFO  as  (the  number  of  input  identifiers  * 

the  number  of  output  identifiers)  ‘  2 

MEBOW  Calculation .  Each  statement  within  the  module  that  is  either  a 
branch  or  the  target  of  a  branch  is  kept  track  of  by  its  line  uumber.  A  list 
of  branch/tar get  pairs  is  kept,  along  with  a  determination  of  the  type  of 
branch.  The  types  of  branches  are  implicit  or  explicit,  and  backwards  or 
forwards,  each  having  a  different  value  for  MEBOW  calculations.  Once  all  of 
the  branch/ target  pairs  have  been  determined  for  the  module,  each  branch's 
value  i.s  calculated. 

Each  branch  3 s  assigned  the  weight  of  its  type.  For  example,  if  a 
branch  from  line  13  to  line  17  is  implicit,  the  branch  (13,17)  is  assigned  1, 
01  the  explicit  branch  (lb, 8)  is  assigned  6.  Then,  the  scope  of  each  branch 
is  calculated.  A  list  of  branches  that  are  within  the  scope  of  each  branch  is 
determined,  along  with  any  knots  within  that  scope.  If  a  branch  has  no  other 
branches  or  knots  within  its  scope,  it  is  flagged  as  completed.  This  list 
that  contains  the  branches  and  knots  within  the  scope  of  a  branch  is  a  l.ist  of 
those  values  the  branch,  depends  on,  or  its  dependency  .list. 
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The  value  for  each  branch  is  calculated  as  the  sum  of  its  weight  and  the 
sum  of  the  values  for  the  branches  and  knots  in  its  dependency  list.  If  each 
of  these  branches  and  knots  has  its  completed  flag  set,  then  the  current 
branch's  value  can  be  calculated  and  its  completed  flag  set.  If  any  of  these 
branches  or  knots  is  not  completed,  the  current  branch  is  bypassed  for  later 
calculation.  This  is  an  iterative  process,  which  continues  through  the  list 
of  branches  until  all  dependencies  have  been  completed.  The  sum  of  all  the 
branch’s  values  is  the  returned  MEBOW  value. 

A  case  statement  will  be  treated  somewhat  differently.  A  list  of  the 
line  numbers  for  its  selections  will  be  kept,  along  with  their  number.  The 
calculation  of  its  value  is  2  *  the  number  of  selections,  added  to  the  value 
of  each  selection.  These  values  are  calculated  the  same  as  any  other 
statements,  and  dependency  lists  are  kept  for  case  statements,  also.  Once 
each  selection's  dependencies  have  been  evaluated,  the  case  statement's  value 
can  be  calculated. 

--  This  creates  the  list  of  branches 
REPEAT 

IF  the  statement  is  a  branch 
THEM 

ADD  the  line  number  to  the  list  of  branches,  along  with  a 
determination  of  what  type  of  branch  it  is 

ELSE 

IF  the  statement  is  the  target  of  a  branch 
THEN 

ADD  the  line  number  to  its  corresponding  branch,  to 
create  a  (TO,  FROM)  representation 

ENDIE 


UNTIL  the  last  statement  is  input 


---  This  generates  the  list  of  dependencies  for  each  branch  and  determines 
--  if  any  knots  have  occurred.  Any  knots  are  kept  in  ?.  separate  li.,t, 

--  and  their  dependencies  are  also  generated. 

REPEAT 


COMPARE  the  line  numbers  for  a  branch  to  each  other  branch 
IF  overlap  of  line  numbers  exists 
THEN 


IF  the  criterion  for  a  knot  exists 
THEN 


ADD  the  knot  to  the  list  of  knots 

ENDIF 

ADD  the  dependency  to  the  current  branch 
SET  the  completion  flag  to  false 

ELSE 

SET  the  completion  flag  to  true 


ENDIF 

UNTIL  all  branches  and  knots  have  had  their  dependencies  evaluated 


—  This  goes  through  the  lists  of  branches  and  knots  and  determines 

—  which  have  enough  information  available  to  calculate  their  value. 
--  If  they  depend  on  the  value  of  a  branch  or  knot  that  is  not  yet 

—  known,  then  pass  to  the  next  branch  or  knot  and  try  to  calculate 
--  its  value. 

REPEAT 


IF  the  completion  flag  of  a  branch  or  knot  is  false 
THEN 


CHECK  the  completion  flags  for  each  branch  and  knot  in  its 
dependency  list 

IF  all  of  the  completion  flags  are  true 
THEN 


ADD  the  values  for  each  branch  and  knot  in  che 
dependency  list 

S.ET  the  completion  flag  to  true 


■i  i 


ENDIF 


END  I  f 


UNTIL  ail  completion  flags  are  set 

--  The  MEBOW  value  is  the  sum  of  the  values  for  the  branches  aiid  the  knots 
•  REPEAT 

ADD  the  value  of  the  current  branch  or  knot  to  the  SUM 
UNTIL  all  branches  and  knots  have  been  added 

--  The  MEBOW  value  is  now  known. 
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Appendix  C:  Empirical  Support;  for  Hybrid  Metrics 
Etnp i r  1  c a  1  Evidenc e  . 

In  Chaj)ter  Four,  descriptions  of  four  studies  that  use  hybrid  meirics 
were  given.  The  data  and  results  that  came  from  these  studies  are  presen  ed 
here  in  more  detail.  This  section  shows  the  data  for  three  of  the  four 
studies . 

jUfura  and  Canning '  s  Stu_di.  Kafura  and  Canning’s  research  identified  32 
components  that  were  extreme  outliers,  or  the  most  error-prone  components, 
from  within  the  170  components  they  studied.  They  analyzed  these  32 
components  with  their  ten  metrics.  They  found  that  28/32  extreme  outliers 
were  identified  by  at  least  one  of  the  ten  metrics  (Kafura  and  Canning, 
1965:382).  The  best  result  that  any  single  metric  had  was  20/32.  This  metric 
was  the  INFO-LOC  hybrid  measure.  Figure  14  shows  a  matrix  of  the  mecrics  and 
the  extreme  outlier  components  they  identified. 

Thi..  analysis  was  also  performed  on  all  the  outliers,  not  just  the 
extreme  outliers.  The  number  of  components  in  this  category  totaled  85. 
INFO-LOC  identified  42/35  outliers.  This  the  best  result  of  any  metric  used. 
This  metric  also  identified  the  second  fewest  number  of  non-outliers  as 
outliers.  All  of  *'he  metrics  incorrectly  identified  some  components  as 
outliers,  but  INFO-LOC 's  percentage  of  correctly  identified  outliers  to  total 
outliers  presented  was  42/63,  which  is  only  heaten  by  LOC ’ s  yield  of  41/59. 
Virile  other  metrics  identified  fewer  non- out  1 iers  as  outliers,  they  also  did 
not  reco.jnize  a.s  mariy  of  the  correct  outliers. 

Harrison  and  Cook's  Study.  Harrison  and  Cook's  results  sliowtd  that 
tiieir  hybrid  metric  was  able  to  identity  the  most  error -prone  modules  better 
than  a  n  y  f)  t  ii e  i  m  t;  t  t  i  c  u s  e d  . 


The  results  in  Figure  15  stiow  that  MhC  as  a 
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Figure  14.  Identification  of  Extreme  Outlier  Error  Components 

(Kafura  and  Canning,  1985:382) 
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hybrid  of  different  types  of  metrics  outperforms  its  component  parts.  They 
wrote,  "as  can  be  seen,  the  KMC  metric  performed  significantly  better  than  any 
of  the  other  metrics  examined"  (Harrison  and  Cook,  1987:217).  The  KMC  metric 
had  a  .82  correlation  with  the  number  of  errors  found  in  the  modules  tested. 
This  metric  was  "based  loosely"  on  the  Hall  and  Preiser  Combined  Network 
Complexity  metric  and  Henry  and  Kafura's  INFO,  with  a  microcomplexity  metric 
of  v{G)  (ibid:215-216)  . 

P,ar.3m.ur thy  and  Melton's  Study.  Ramamurthy  and  Melton  did  not  perform  a 
statistical  analysis  of  the  quality  of  their  weighted  metrics.  Instead,  they 
showed  comparisons  of  the  unweighted  and  weighted  Software  Science  and  v(G) 
metrics  against  24  pairs  of  test  programs  and  program  segments.  In  three 
tables,  each  pair  is  shown  with  the  first  program  as  the  more  complex  of  the 
two.  No  justification  how  the  first  program  was  identified  as  more  complex 
was  given. 

Their  first  table  showed  six  programs  with  the  same  Software  Science 
values  but  different  v(G).  Their  weighted  metrics  identified  the  first 
program  of  the  test  pair  as  more  complex  in  all  cases.  Their  second  table 
showed  eight  programs  with  the  same  v(G)  but  different  Software  Science 
values.  In  all  cases,  the  weighted  effort  showed  the  first  program  in  the 
test  pair  as  more  complex.  In  one  case,  the  weighted  length  and  volume 
measures  incorrectly  identified  the  less  complex  program,  although  the 
weighted  effort  for  the  same  program  was  correct.  Their  third  table  showed 
ten  programs  with  different  Software  Science  and  v(G)  values.  The  weighted 
length  and  volume  metrics  correctly  identified  all  ten  programs.  The  weighted 
effort  metric  incorrectly  identified  one  program.  Overall,  the  weighted 
metrics  correctly  identified  23/24  programs  as  being  more  complex  than  the 
less  complex  of  the  test  pair.  The  Software  Science  metrics  correctly 
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identified  16/24,  and  v(G)  correctly  identified  15/24. 
hybrid  measure  identifies  complexity  better  than 
(Ramaraurthy  and  Melton,  1986:312). 


This  suggests  that  a 
a  single  metric  can 


Appendix  D;  Calculation  ^  Metric  Value  for  an  Ada  Procedure 

This  appendix  shows  the  calculation  of  both  metrics  for  an  Ada  procedure. 
This  procedure  was  taken  from  a  program  that  calculates  v(G)  for  Ada  programs, 
and  is  in  the  public  domain.  First  the  MEBOW  calculation  is  presented,  then  a 
list  of  the  input  and  output  variables  will  be  given  with  a  calculation  of 
information  flow. 

The  MEBOW  calculation  for  this  procedure  will  not  be  presented  in  the 
same  format  as  Figure  12.  Tlie  flow  graph  for  this  example  is  complicated  and 
will  not  add  to  the  comprehensibility  of  the  example.  Instead,  Figure  16  shows 
MEBOW  values  for  five  basic  control  structures.  These  structures  are  labeled  in 
the  example.  Their  MEBOW  values,  which  are  their  basic  values  added  to  their 
scope,  are  presented  after  the  example. 

The  first  line  of  a  control  structure  branch  used  in  the  MEBOW  calculation 
is  labeled  with  a  designator  for  reference  during  the  MEBOW  value  calculation. 
This  designation  is  "C"  for  a  case  statement,  "I"  for  an  if  statement,  and  "W" 
for  a  while  loop.  These  designators  are  numbered  sequentially,  so  "13"  refers  to 
the  third  occurrence  of  an  If  statement. 

Figure  16  shows  the  MEBOW  values  for  the  if  statements  and  while 
statements  used,  but  does  not  refer  to  the  case  statements.  Each  case 
statement's  MEBOW  value  Is  twice  the  number  of  branches,  which  are  its 
enclosed  "when"  statements.  To  this  value  the  scope  of  each  enclosed  branch  is 
added,  to  give  the  MEBOW  value  for  the  branch.  For  example,  the  case 
statement  C4  has  three  branches,  and  encloses  C5,  C6,  and  15  within  its  scope. 
Therefore,  its  value  is  (2  *  3)  +  C5  +  C6  +  15  =  29. 


&  I ; 

Os, 

(a)  Sequence 
me BOW  -  0 


ti ; 

IF  <cl> 
THEN  s2; 
s3 


si 

IF  <cl> 
THEN  s2 
ELSE  s3-, 


(b)  Selection 
IF.. .THEN 
ME BOW  •  3 


(c)  Selection 

IF. ..THEN. ..ELSE 
MEBOW  •  U 


tl; 

REPEAT 

s2 

UNTIL  <cl>: 

s3 


9^’  ■-■HILE  <cl>  DO 


6^3 


(d)  Repetition 

repeat..  .UNTIL 
MEBOW  •  5 


(4)  Repetition 

WHILE  <cl> 

MEBOW  •  7 


Figure  16.  MEBOW  Basic  Control  Constructs 
(Jayaprakash  and  others,  1987:240) 
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procedure  Scan_Nuineric_Literal ;  --1  Scans  numbers 

--1  Requires 

■  “  I 

---1  This  subprogram  requires  an  opened  source  file,  and  the 
"■1  Universal  Arithmetic  package  to  handle  conversions. 


E  f  f  e  c  t  -S 


This  subprogram  scarus  the  rest  of  the  numeric  literal  and  converts 
it  to  internal  universal  number  format. 


Modifies 

CST 


procedure  Scan_Numeric_Literal  is 


Overview 


--1  Note  the  following  LRM  Sections: 

--1  LRM  Section  2.4  -  Numeric  Literals 

--1  LRM  Section  2.4.1  -  Decimal  Literals 

--1  LRM  Section  2.4.1  -  Notes 

--1  LRM  Section  2.4.2  -  Based  Literals 

--1  LRM  Section  2.10  -  Allowed  Replacements  of  Characters 


--  Declarations  for  Scan  Numeric  Literal 


Bastd_Li teral_Delimiter  :  character; 

--1  holds  value  of  first  based_literal  delimeter: 

-“I  ASCI  I.  COLON  or  ASCII.  SHARP  Cf); 

--|  so  the  second  one  can  be  checked  to  be  identical. 

Base_Bei ng_ Used  :  GC . Parser  Integer ; 

1  base  value  to  be  passed  to  Scan_Based_Li teral . 


begin 

CST  -  gr  am_sy!ii_  va  1  :=■  PT.Numer  icTokenValut; 

Work  String . Length  0; 

also  used  by  sub  scanners  called  from  this  subprogram. 
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Scan  Lirst  faeld 
Sc?,n_Integer  ; 

■-  No{>’,  scan  res.  of  literal  dependent  on  what  Next_char  is 
Cl  case  Next  jlhar  is 

--  have  a  deci£fial__literal 
when  '  .  ’  -O' 

11  if  (Look_Ahead  (1)  then 

---  next  token  is  a  range  double  delimiter. 

finished  with  nuir.eric_literal . 

Seen_Radix_Poiiit  false;  --  have  an  integer__literal 

--  already  set_up  for  next  scanner, 

--  nc  call  to  Get_.Next_Char . 

else 

Seen_Radix_Poi,nt  :=  true; 

Add__Kext_Char__To_Source_Rep; 

Get__Next_Char ; 

C2  case  Next__Char  is 

when  Digit  ==> 

Scan_Integer ; 

--  check  .and  flag  multiple  radi,:<  points 
VI  while  (Ne;':t_Char  =  and  then 

(Look__Ahead (1 )  in  digit)  loop 
LEK . Du  t  pu  t  _ Me  s  s  ag  e 
(  Current_Line 
,  Curr6nt__Column 
,  LEM.Too_Many_Raclix_Points)  ; 

A  d  d  ..N  e  X  t  _.C  h  a  r  _T  o  _S  o  u  r  c  e  _R  e  p  ; 

Get._N®xf  _Char ; 

Scan_Integer ; 
end  loop; 

when  ASCII. UNDERLINE  =>  -- 

flag  illegal  leading  under  line 
LEN.0utput__idessag6  ( 

CurrentJjine 
,  Current^ColuffiR 
,  LEK .  Le3.ding_Uiider  1  i ne )  ; 

Scan  _l  iit  eger ; 

not  fl.ayging  an  integer  consisting  of  a 
--  single  underline  as  a  trailing  radix 
■  point  case.  Check  and  flag  multiple  radix 
.  points. 

W2  while  (Next.,Cltar  --  and  then 

(Lo  jk___Ahead  (1 '  in  digit)  loop 
IjEM  .  Ou  t  p u  t  __He s  s  .a  g  e  { 

Cut  rent^Line 
,  Curreitt_Column 
,  LEK ,  Too_  Naiiy_Hadix_Point  5 )  ; 

Add  ,  Ne X  t  _Ch  d  r  _  To_  S o u  ?:  c e  R  e p ; 

Get  Next  Char ; 

Scan  Integer ; 
end  loop; 


when  others  ~) 

flag  trailing  radix  point  as  an  error 
IjEM.Output__Message  { 

Current_Line 
,  Current_Column 

,  LEM.Digit_Needed_After_Radix_Point) ; 

end  case; 

Scari_Expo;ient ;  --  check  for  and  process  exponent 

end  if; 

--  have  a  based_literal 
when  ASCII. SHARP  I  --  T 

ASCII. COLON  =>  —  ‘  ' 

if  (Next_Char  =  ASCII. COLON)  and  (Look_Ahead (1)  =  '=’)  then 
—  next  token  is  an  assignment  compound  delimiter 
--  finished  with  numeric  literal. 

Seen_.Radix_Point  false;  —  have  an  integer  literal 
already  set  up  for  next  scanner,  no  call  to 
--  Get_Next _Char . 

else 

Based_Literal_Delir.iter  :=  Next_Char; 

Base_Being_Used  ;=  GC.Parserlnteger 'VALUE 
(Vork_String  (1. . Work_Str ing._Length)  )  ; 
if  (Base_Being_Used  not  in  Valid_.Base„Range)  then 
--  flag  illegal  bases  as  errors 
LEM.0utpiit_Hes3age  { 

Current_Line 
,  Current_Column 

,  Work_String (1 . .Work_String_Length) 

,  LEM.Base_Out__Of_  Legal  _Range_Use_16)  ; 
Base_Being.„Used  :=  16; 

--  we  use  the  maximum  oase  to  pas.s  all  the- 
--  extendod_digit s  as  legal, 
end  if; 

Add_Nefl;t_Char__To__Source_Rep;  ---  save  the  base  delimiter 
Get__Next_Char; 

case  NextChar  is 

when  'A'  ..  'F'  !  ‘a’  ..  'f'  1  Digit  -> 
Scan_Based_Iritegei"  (Base_Bei.ng__lhied)  ; 
when  ASCII  ..UNDERLINE  ' 

.  flag  illegal  leading  under  line 

LE.M .  Ou  t  p  u  t  e  5  s  a  g  e  { 

Currfc)it_Linf 
,  C  u  r  r  fc  n  t  o  1  u  m  n 
,  LEM.Leadii'g  Under  line)  ; 

■  not:  flaggjiig  .an  integer  con.sistin(j  of  ,i  singl 
-  under  tine  as  a  trailing  radix  point  case. 

<  Base^Bei  ng  _  Used  )  ; 


--  .flag  leading  radix  point  as  an  error 
LEM Ou  t  pu  t  _Me  s  s  a  g  e  ( 

Current _Line 
,  Current_Column 

,  LEM.Diglt__Needed  Before_Radix  Point); 
when  ASCII. SHARP  1  --  T 

ASCII. COLON  ’ 

--  tlag  missing  field  as  an  error 
LEM  .Output_Me.ssage  ( 

Current_Line 
,  Current__Column 

,  LEM.)SIo__Integer_In._Based_Nuii)ber )  ; 

--  based_literal_del3 miter_mismatch  handled  in 
--  next  case  statement, 
when  others  => 

—  flag  missing  field  as  an  error 
LEM.OutputJlassage ( 

Current_Line 
,  C'arrent_Column 

,  LEM.No_Integer  _In._Based_Number )  ; 

end  case; 


C4 


C5 


VI 


# 


# 


case  Next_Char  is 
when  ' . '  =  > 

Secn_Radix_Point  ;=  true;  —  have  a  real.J,iteral 
Add_Next__Char_To__Source_Rep; 

Get_Kext_Char ; 
case  Nfcxt_Char  is 

when  'A'  ..  1  'a'  ..  'f'  I  Digit  -> 

Scan__Based__Integer  {Base__Being_JJsed)  ; 

• —  check  and  flag  multiple  radix  points 
while  (Next_Char  -  and  then 

{ (Look  __Ahe3d  ( 1 }  in  digit)  or 
(Look_Ahead (1 }  in  'A'  ..  'F'}  or 
(LooK__Ahead  (1)  in  'a'  ..  '{'})  loop 
IjEN  ..Output^Message  ( 

Current_Lir!e 
,  C uiren(._Co],urari 

,  L£M.Too_Many_R '.dix . "'onits)  ; 

Add  t  X  t  _  C  h  a !  T  o  __S  o  u  r  c  e  _  R  e  p ; 

V*  e  t _ N  A  t _ C  ti  a  r  ; 

Scan„Bas,^J_.Integer  (Base . Be 3  ng__UEed)  : 

end  loop; 

when  ASCII -UNDERLINE 

--  flag  illegal  leading  nndei  lined 
LEH. O'utput  Message  ( 

Ctu  1  ent  Li ru: 

,  Current  Col \) ran 
,  LEM .  litadi  ng_  Under  1 1  ne }  ; 

■  not  flagging  Aii  inti-gej  Ciins  i  s  t ’.ng  of 
■  -  a  single  andeiline  as  a  trailing 
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--  radix  point  case, 

Scan__Based__Integer  (Base,.,Being__Used)  ; 
when  others 

flag  trailing  radix  point  as  an  error 
LEK.Output_Nessage ( 

Current_Line 
,  Current _Coluit'.n 

,  LEK , Digit _Needed__Af ter __Radix_Po in t ) 

end  case; 

case  Next _Char  is 

^hen  ASCII. SHARP  1  -■  T 

ASCII. CCLON  =>  --  •  :  ' 

Add_Ne>:t_Char_To_Source_Rep ; 

--  save  the  base  delimiter 

if  (Next_Char  /=  Based_Literal_Delimiter ) 
then 

—  flag  based___literal  delimiter 
---  mismatch  as  an  error 
LEM.Output_Message ( 

Current,.bine 
,  Current_Column 
,  "Opener.:  " 

&  Based_Literal_Delimiter 
&  "  Closer:  "  &  Next_Char, 
LEH.Based._Literal_Delimiter_Mismatch) 
end  if; 


Get__Next__Char ;  --  after  base  delimiter 
check  for  and  process  exponent 
Scan__Exponent ; 

when  others  -> 

flag  missing  second 
-  based__literal  delimiter  as  an  error 
LEM.Output_Message ( 

Cur  rfcnt_Line 
,  Current_Co] umn, 

LF.H  -Miss  ing_Second_Based_Lit  er  al_Delimiter ) 
end  ca-Se; 


when  ASCII. SHARP  i  --  T 

^  ASCII. COLON 

■  have  ati  j  ntegei  __literal 
Seen _Radi.>;  Point  false; 

-  .save  the  base  delimiter 
A  d  d  _N  t >  X  t  C  h  a  r  _To _S  on  r  c  e  _R  e  p ; 

0  li  if  (Next  Char  /-^  Based  _Liter  al__Delimiter)  then 

flag  ba.sed  ^literal  delimiter  mi.srr.atch  error 
I  ■  E  r! ,  0  u  t  p  u  t  e  .s  s  a  g  e  ( 
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Current__Line 
,  Current_Column 

,  "Opener:  "  &  Based__Literal_Deli!niter 

&  "  Closer:  "  &  Next„Char 
,  LEM.Based_Literal_Delimiter_Mismatch) ; 

end  if; 

Get_Next_Char ;  —  get  character  after  base 

--  delimiter 

Scan_Exponent;  --  check  for  and  process  exponent 

when  others  => 

assume  an  integer_literal 
Seen_Radix_Point  :=  false; 

—  flag  missing  second 

—  based_literal  delimiter  as  an  error 
LEM.Output_Message ( 

Current__Line 
,  Current_Column 

,  LEM. Mi ssing_Second_Based_Literal__De limiter) 

end  case; 
end  if; 

--we  have  an  integer_literai 
when  others  -> 

Seen_Radix_Point  :=  false;  --  have  an  integer_literdl 
Scan_Exponent ;  —  checlc  for  and  process  exponent 

end  case; 

--  one  last  error  check 

16  if  (Next  Char  in  Upper_Case_Letter )  or 

(Next_Char  in  Lower_Case_Letter )  then 
---  flag  missing  space  between  numeric_literal  and 
--  identifier  (including  RW)  as  an  error. 

LEM .  Output__Message 
(  Current__Line 
,  Current  Column 

,  LEM.Space_Must_Separate_Num_And_lds) : 

end  if; 

--  now  store  the  source  representation  of  the  token  found. 

Set  _CST_Source_Rep (Work_St ring (1 . .Work_String_Length) ) ; 

end  Scan  _Numeric_Iiiteral  ; 


'I'hlK  exaiitple  has  six  case  stateiiieiits,  six  if  St ateinenis,  and  Lfiroe  while 
Rt  at  ernent s.  Each  stat  (.'meiil.  is  calculaled  as  its  tiasic  value  plus  It.s  scajiu*.  No 
knots  exist  in  this  example  to  he  couiitt-d. 
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Branches : 


Cl  =  (2  *  3)  +  II  +  12  ^  76 

C2  (2*3)  +  W1  +  W2  =  20 

C3  =  (2  *  5)  =  10 

C4  =  (2  *  3)  +  C5  +  C6  +  15  =  29 

C5  =  (2  *  3)  +  H3  =  13 

C6  (2*2)  +  14  =  7 

11  =  4  +  C2  -  24 

12  =  4  +  C3  +  C4  +  13  46 

13  =  3 

14  ^  3 

15  =  3 

16  =  3 

W1  -  7 

V2  =  7 

W3  =  7 

MEBOW  =  258 

It  is  interesting  to  note  that  this  MEBOW  value  is  only  slightly  larger 
than  the  value  for  the  much  shorter  example  given  in  Figure  12.  The  reason  is 
that  this  example  shows  well-structured  code.  This  code  has  no  explicit 
backwards  branches,  and  no  knots  exist.  This  procedure's  execution  will  always 
proceed  from  the  top  to  the  bottom,  except  for  the  Implicit  backwards  branches 
because  of  the  while  statements. 

The  information  flow  calculation  for  this  procedure  is  quite  simple.  All  of 

the  variables  used  other  than  the  three  iocaliy-declared  variables  are  global. 

If  these  variables  are  on  the  left  side  of  an  assignment,  they  are  counted  for 

fan-out.  If  they  are  used  in  any  other  location,  llie’/  are  counted  for  fan-in. 

One  variable,  " Work_String_Letigth,"  is  u.sed  for  botn  and  is  counted  as  both 

fan  -  out  and  fan-in. 

Variables  counted  for  fan-out; 

CST ,  graiB_sym_val 
W o r  k _S  t  r 1 n g  be ng i h 
Seen  Rad ix_ Point 
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Variables  counted  for  fan-in; 


H 


PT . NumericTokenValue ; 

Next_Chai' 

Look_Ahead 

Digit 

Current  _Line 

Current^Column 

LEM  .Too__Many__Radix_Points 

LEM. Lead! ng_Under line 

LEM . Digit_Needed_Af  ter_Radix_Point 

GC . Parser Integer ' VALUE 

Work__String 

Work_String_Length 

Valid_Base„Range 

LEM  .Base_Out_Of  __Legal_Range_Use_16 

LEM .Lead ing_Under line 

LEM.Digit_Needed_Before_Radix_Point 

LEM .No_Integer_In_Based_Number 

LEM.Based_Li  teral_Delimiter._Misiiftatcb 

LEM.Missing_Second_Based_Literal_Delimiter 

Upper _Case_Let ter 

Lovfer_Case_Letter 

LEM . Space_Must_Separate_Num_And_Ids 

INFO  is  defined  in  equation  4  as  (fan-in  '  fan-out)  *'  2.  The  number  of 
variables  counted  for  fan-in  is  22.  The  number  of  variaoles  counted  for  fan¬ 
out  is  three. 

INFO  =  (22  ■  3)  ••  2  =  4356 
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